r - plyr or dplyr in Python -
this more of conceptual question, not have specific problem
i learning python info analysis, ma familiar r - 1 of great things r plyr (and of course of study ggplot2) , improve dplyr. pandas of course of study has split-apply in r can things (in dplyr, bit different in plyr, , can see how dplyr mimics . notation object programming)
info %.% group_by(c(.....)) %.% summarise(new1 = ...., new2 = ...., ..... newn=....) in create multiple summary calculations @ same time
how do in python, because
df[...].groupby(.....).sum() sums columns, while on r can have 1 mean, 1 sum, 1 special function, etc. on 1 call
i realize can operations separately , merge them, , fine if using python, when comes downwards choosing tool, line of code not have type , check , validate adds in time
in addition, in dplyr can add together mutate statements well, seems me way more powerful - missing pandas or python -
my goal learn, have spent lot of effort larn python , worthy investment, still question remains
thanks in advance
i think you're looking agg function, applied groupby objects.
from docs:
in [48]: grouped = df.groupby('a') in [49]: grouped['c'].agg([np.sum, np.mean, np.std]) out[49]: sum mean std bar 0.443469 0.147823 0.301765 foo 2.529056 0.505811 0.96 python r pandas plyr dplyr
No comments:
Post a Comment