python - Merge pandas dataframe, with column operation -
i searched archive, did not find wanted (probably because don't know key words use)
here problem: have bunch of dataframes need merged; want update values of subset of columns sum across dataframes.
for example, have 2 dataframes, df1 , df2:
df1=pd.dataframe([ [1,2],[1,3], [0,4]], columns=["a", "b"]) df2=pd.dataframe([ [1,6],[1,4]], columns=["a", "b"]) b b 0 1 2 0 1 5 1 1 3 2 0 6 2 0 4
after merging, i'd have column 'b' updated sum of matched records, while column 'a' should df1 (or df2, don't care) before:
b 0 1 7 1 1 3 2 0 10
now, expand merging 3 or more info frames.
are there straightforward, build-in tricks this? or need process 1 one, line line?
===== edit / clarification =====
in real world example, each info frame may contain indexes not in other info frames. in case, merged info frame should have of them , update shared entries/indexes sum (or other operation).
only partial, not finish solution yet. main point solved:
df3 = pd.concat([df1, df2], bring together = "outer", axis=1) df4 = df3.b.sum(axis=1)
df3 have 2 'a' columns, , 2 'b' columns. sum() function on df3.b add together 2 'b' columns , ignore nans. df4 has column 'b' sum of df1 , df2's 'b' columns, , indexes.
did not solve column 'a' though. in real case, there quite few number of nan in df3.a , while others in df3.a should same. haven't found straightforward way create column 'a' in df4 , fill value non-nan. searching "count" function occurance of elements in rows of df3.a (imagine has few dozens column 'a').
python pandas merge dataframes
No comments:
Post a Comment