Wednesday, 15 January 2014

python - Pandas aligning multiple dataframes with TimeStamp index -



python - Pandas aligning multiple dataframes with TimeStamp index -

this has been bane of life past couple of days. have numerous pandas dataframes contain time series info irregular frequencies. seek align these single dataframe.

below code, representative dataframes, df1, df2, , df3 ( have n=5, , appreciate solution work n>2):

# df1, df2, df3 given @ bottom import pandas pd import datetime # can align df1 df2 df1aligned, df2aligned = df1.align(df2) # , concatenate single dataframe combined_1_n_2 = pd.concat([df1aligned, df2aligned], axis =1 ) # since don't know better, seek align df3 combined_1_n_2 manually: combined_1_n_2.align(df3) error: reindexing valid uniquely valued index objects

i have thought why error, rid of duplicate indices in combined_1_n_2 , seek again:

combined_1_n_2 = combined_1_n_2.groupby(combined_1_n_2.index).first() combined_1_n_2.align(df3) # stll same error error: reindexing valid uniquely valued index objects

why getting error? if worked, manual , ugly. how can align >2 time series , combine them in single dataframe?

data:

df1 = pd.dataframe( {'price' : [62.1250,62.2500,62.2375,61.9250,61.9125 ]}, index = [pd.datetimeindex([datetime.datetime.strptime(s, '%y-%m-%d %h:%m:%s.%f')])[0] s in ['2008-06-01 06:03:59.614000', '2008-06-01 06:03:59.692000', '2008-06-01 06:15:42.004000', '2008-06-01 06:15:42.083000','2008-06-01 06:17:01.654000' ] ]) df2 = pd.dataframe({'price': [241.0625, 241.5000, 241.3750, 241.2500, 241.3750 ]}, index = [pd.datetimeindex([datetime.datetime.strptime(s, '%y-%m-%d %h:%m:%s.%f')])[0] s in ['2008-06-01 06:13:34.524000', '2008-06-01 06:13:34.602000', '2008-06-01 06:15:05.399000', '2008-06-01 06:15:05.399000','2008-06-01 06:15:42.082000' ] ]) df3 = pd.dataframe({'price': [67.656, 67.875, 67.8125, 67.75, 67.6875 ]}, index = [pd.datetimeindex([datetime.datetime.strptime(s, '%y-%m-%d %h:%m:%s.%f')])[0] s in ['2008-06-01 06:03:52.281000', '2008-06-01 06:03:52.359000', '2008-06-01 06:13:34.848000', '2008-06-01 06:13:34.926000','2008-06-01 06:15:05.321000' ] ])

your specific error due column names of combined_1_n_2 having duplicates (both columns named 'price'). rename columns , sec align work.

one alternative way chain join operator, merges frames on index, below.

in [23]: df1.join(df2, how='outer', rsuffix='_1').join(df3, how='outer', rsuffix='_2') out[23]: cost price_1 price_2 2008-06-01 06:03:52.281000 nan nan 67.6560 2008-06-01 06:03:52.359000 nan nan 67.8750 2008-06-01 06:03:59.614000 62.1250 nan nan 2008-06-01 06:03:59.692000 62.2500 nan nan 2008-06-01 06:13:34.524000 nan 241.0625 nan 2008-06-01 06:13:34.602000 nan 241.5000 nan 2008-06-01 06:13:34.848000 nan nan 67.8125 2008-06-01 06:13:34.926000 nan nan 67.7500 2008-06-01 06:15:05.321000 nan nan 67.6875 2008-06-01 06:15:05.399000 nan 241.3750 nan 2008-06-01 06:15:05.399000 nan 241.2500 nan 2008-06-01 06:15:42.004000 62.2375 nan nan 2008-06-01 06:15:42.082000 nan 241.3750 nan 2008-06-01 06:15:42.083000 61.9250 nan nan 2008-06-01 06:17:01.654000 61.9125 nan nan

python pandas concatenation time-series

No comments:

Post a Comment