Monday, 15 June 2015

python - Compare pandas dataframes by multiple columns -


What is the best way to find out how two data frames differ based on the combination of multiple columns, so if I have The following is:

df1:

  ABC 0 1 2 3 1 3 4 2  

df2:

  ABC 0 1 2 3 1 3 5 2  

Want to show all the rows where there is a difference (3,4,2 vs) vs (3,5,2) from above example. I have tried using pd.merge () if I use all the columns as keys to join the outer column, then I end up with the dataframe which helps me in my desire. , But it does not turn in that way

Thanks to Adkum I was able to use a mask from a boolean difference as below, but first it was to ensure that the indexes are comparable.

  df1 = df1.set_index ('A') df2 = df2.set_index ('A') # gave me a good indicator using one of the keys #If I get null I have different rows I Df1 = df1.reindex_like (df2) df1 [~ (df1 == df2) .all (axis = 1)] # this gave me all the rows that are different.  

We use and pass axis = 1 If we can compare the line, then we can use this Boolean index to show those rows which are different from the index without the ~ mentioned:

  [43]: df [~ (df == df1) .all (axis = 1)] out [43]: ABC 1 3 4 2  

breaking it down:

In [pre-> [44]: DF == DF1 out [44]: ABC 0 True True True 1 True Falls In [45]: (DF == DF1). All (axis = 1) out [45]: 0 True 1 untrue DTP: Bull

Then we can pass df above as a Boolean index And it is ~


No comments:

Post a Comment