My Blog: R, select rows according to the rank of a certain column -

Friday, 15 July 2011

R, select rows according to the rank of a certain column -

i have r dataframe below,

name score marry 98 marry 77 marry 87 marry 96 mark 99 mark 44 mark 79 john 87 john 77

for each of name, want select rows highest 2 score, should be,

name score marry 98 marry 96 mark 99 mark 79 john 87 john 77

could help? many thanks!

here's possible base of operations approach:

mydf[with(mydf, ave(-score, name, fun = order)) %in% c(1, 2), ] #    name score # 1 marry    98 # 4 marry    96 # 5  mark    99 # 7  mark    79 # 8  john    87 # 9  john    77

for curious, on timings--here's little test...

two sample datasets, both 1m rows, 2 columns, 1 1000 possible values "name" , other 10000 possible values.

set.seed(1) df1 <- data.frame(   name = sample(1000, 1000000, true),   score = sample(0:100, 1000000, true) ) df2 <- data.frame(   name = sample(10000, 1000000, true),   score = sample(0:100, 1000000, true) )

the functions benchmark--i'll seek add together "dplyr" later after reinstall it.

fun1 <- function(mydf) {   mydf[with(mydf, ave(-score, name, fun = order)) %in% c(1, 2), ] }  fun2 <- function(mydf) {   as.data.table(mydf)[order(-score), .sd[1:2], by=name] }  fun3 <- function(mydf) {   df <- as.data.table(mydf)   setorder(df, -score)[, head(.sd, 2), = name] }

the benchmarking.

library(microbenchmark) microbenchmark(fun1(df1), fun2(df1), fun3(df1),                 fun1(df2), fun2(df2), fun3(df2), times = 20) # unit: milliseconds #       expr        min         lq       mean     median         uq       max neval #  fun1(df1)  502.76809  513.98317  569.47883  597.90488  603.34458  686.4302    20 #  fun2(df1)  733.12544  741.18777  796.67106  822.60824  828.88449  839.3837    20 #  fun3(df1)   87.80581   93.07012   95.34281   95.56374   97.49608  101.7991    20 #  fun1(df2)  672.60241  764.10237  764.60365  772.33959  780.14679  799.3505    20 #  fun2(df2) 6338.14881 6360.42621 6407.66675 6412.99278 6451.75626 6479.2681    20 #  fun3(df2)  354.24119  366.47396  382.58666  369.78597  374.01897  468.9197    20

My Blog

Friday, 15 July 2011

R, select rows according to the rank of a certain column -

No comments:

Post a Comment