What is the best way to find out how two data frames differ based on the combination of multiple columns, so if I have The following is:
df1:
ABC 0 1 2 3 1 3 4 2
df2:
ABC 0 1 2 3 1 3 5 2
Want to show all the rows where there is a difference (3,4,2 vs) vs (3,5,2) from above example. I have tried using pd.merge () if I use all the columns as keys to join the outer column, then I end up with the dataframe which helps me in my desire. , But it does not turn in that way
Thanks to Adkum I was able to use a mask from a boolean difference as below, but first it was to ensure that the indexes are comparable.
df1 = df1.set_index ('A') df2 = df2.set_index ('A') # gave me a good indicator using one of the keys #If I get null I have different rows I Df1 = df1.reindex_like (df2) df1 [~ (df1 == df2) .all (axis = 1)] # this gave me all the rows that are different.
We use and pass axis = 1
If we can compare the line, then we can use this Boolean index to show those rows which are different from the index without the ~
mentioned:
[43]: df [~ (df == df1) .all (axis = 1)] out [43]: ABC 1 3 4 2
breaking it down:
In [pre-> [44]: DF == DF1 out [44]: ABC 0 True True True 1 True Falls In [45]: (DF == DF1). All (axis = 1) out [45]: 0 True 1 untrue DTP: Bull
Then we can pass df
above as a Boolean index And it is ~
No comments:
Post a Comment