python - Pandas aligning multiple dataframes with TimeStamp index -
this has been bane of life past couple of days. have numerous pandas dataframes contain time series info irregular frequencies. seek align these single dataframe.
below code, representative dataframes, df1
, df2
, , df3
( have n=5, , appreciate solution work n>2
):
# df1, df2, df3 given @ bottom import pandas pd import datetime # can align df1 df2 df1aligned, df2aligned = df1.align(df2) # , concatenate single dataframe combined_1_n_2 = pd.concat([df1aligned, df2aligned], axis =1 ) # since don't know better, seek align df3 combined_1_n_2 manually: combined_1_n_2.align(df3) error: reindexing valid uniquely valued index objects
i have thought why error, rid of duplicate indices in combined_1_n_2
, seek again:
combined_1_n_2 = combined_1_n_2.groupby(combined_1_n_2.index).first() combined_1_n_2.align(df3) # stll same error error: reindexing valid uniquely valued index objects
why getting error? if worked, manual , ugly. how can align >2 time series , combine them in single dataframe?
data:
df1 = pd.dataframe( {'price' : [62.1250,62.2500,62.2375,61.9250,61.9125 ]}, index = [pd.datetimeindex([datetime.datetime.strptime(s, '%y-%m-%d %h:%m:%s.%f')])[0] s in ['2008-06-01 06:03:59.614000', '2008-06-01 06:03:59.692000', '2008-06-01 06:15:42.004000', '2008-06-01 06:15:42.083000','2008-06-01 06:17:01.654000' ] ]) df2 = pd.dataframe({'price': [241.0625, 241.5000, 241.3750, 241.2500, 241.3750 ]}, index = [pd.datetimeindex([datetime.datetime.strptime(s, '%y-%m-%d %h:%m:%s.%f')])[0] s in ['2008-06-01 06:13:34.524000', '2008-06-01 06:13:34.602000', '2008-06-01 06:15:05.399000', '2008-06-01 06:15:05.399000','2008-06-01 06:15:42.082000' ] ]) df3 = pd.dataframe({'price': [67.656, 67.875, 67.8125, 67.75, 67.6875 ]}, index = [pd.datetimeindex([datetime.datetime.strptime(s, '%y-%m-%d %h:%m:%s.%f')])[0] s in ['2008-06-01 06:03:52.281000', '2008-06-01 06:03:52.359000', '2008-06-01 06:13:34.848000', '2008-06-01 06:13:34.926000','2008-06-01 06:15:05.321000' ] ])
your specific error due column names of combined_1_n_2
having duplicates (both columns named 'price'). rename columns , sec align work.
one alternative way chain join
operator, merges frames on index, below.
in [23]: df1.join(df2, how='outer', rsuffix='_1').join(df3, how='outer', rsuffix='_2') out[23]: cost price_1 price_2 2008-06-01 06:03:52.281000 nan nan 67.6560 2008-06-01 06:03:52.359000 nan nan 67.8750 2008-06-01 06:03:59.614000 62.1250 nan nan 2008-06-01 06:03:59.692000 62.2500 nan nan 2008-06-01 06:13:34.524000 nan 241.0625 nan 2008-06-01 06:13:34.602000 nan 241.5000 nan 2008-06-01 06:13:34.848000 nan nan 67.8125 2008-06-01 06:13:34.926000 nan nan 67.7500 2008-06-01 06:15:05.321000 nan nan 67.6875 2008-06-01 06:15:05.399000 nan 241.3750 nan 2008-06-01 06:15:05.399000 nan 241.2500 nan 2008-06-01 06:15:42.004000 62.2375 nan nan 2008-06-01 06:15:42.082000 nan 241.3750 nan 2008-06-01 06:15:42.083000 61.9250 nan nan 2008-06-01 06:17:01.654000 61.9125 nan nan
python pandas concatenation time-series
No comments:
Post a Comment