Friday, 15 March 2013

python - Merge pandas dataframe, with column operation -



python - Merge pandas dataframe, with column operation -

i searched archive, did not find wanted (probably because don't know key words use)

here problem: have bunch of dataframes need merged; want update values of subset of columns sum across dataframes.

for example, have 2 dataframes, df1 , df2:

df1=pd.dataframe([ [1,2],[1,3], [0,4]], columns=["a", "b"]) df2=pd.dataframe([ [1,6],[1,4]], columns=["a", "b"]) b b 0 1 2 0 1 5 1 1 3 2 0 6 2 0 4

after merging, i'd have column 'b' updated sum of matched records, while column 'a' should df1 (or df2, don't care) before:

b 0 1 7 1 1 3 2 0 10

now, expand merging 3 or more info frames.

are there straightforward, build-in tricks this? or need process 1 one, line line?

===== edit / clarification =====

in real world example, each info frame may contain indexes not in other info frames. in case, merged info frame should have of them , update shared entries/indexes sum (or other operation).

only partial, not finish solution yet. main point solved:

df3 = pd.concat([df1, df2], bring together = "outer", axis=1) df4 = df3.b.sum(axis=1)

df3 have 2 'a' columns, , 2 'b' columns. sum() function on df3.b add together 2 'b' columns , ignore nans. df4 has column 'b' sum of df1 , df2's 'b' columns, , indexes.

did not solve column 'a' though. in real case, there quite few number of nan in df3.a , while others in df3.a should same. haven't found straightforward way create column 'a' in df4 , fill value non-nan. searching "count" function occurance of elements in rows of df3.a (imagine has few dozens column 'a').

python pandas merge dataframes

No comments:

Post a Comment