R, select rows according to the rank of a certain column -
i have r dataframe below,
name score marry 98 marry 77 marry 87 marry 96 mark 99 mark 44 mark 79 john 87 john 77
for each of name, want select rows highest 2 score, should be,
name score marry 98 marry 96 mark 99 mark 79 john 87 john 77
could help? many thanks!
here's possible base of operations approach:
mydf[with(mydf, ave(-score, name, fun = order)) %in% c(1, 2), ] # name score # 1 marry 98 # 4 marry 96 # 5 mark 99 # 7 mark 79 # 8 john 87 # 9 john 77
for curious, on timings--here's little test...
two sample datasets, both 1m rows, 2 columns, 1 1000 possible values "name" , other 10000 possible values.
set.seed(1) df1 <- data.frame( name = sample(1000, 1000000, true), score = sample(0:100, 1000000, true) ) df2 <- data.frame( name = sample(10000, 1000000, true), score = sample(0:100, 1000000, true) )
the functions benchmark--i'll seek add together "dplyr" later after reinstall it.
fun1 <- function(mydf) { mydf[with(mydf, ave(-score, name, fun = order)) %in% c(1, 2), ] } fun2 <- function(mydf) { as.data.table(mydf)[order(-score), .sd[1:2], by=name] } fun3 <- function(mydf) { df <- as.data.table(mydf) setorder(df, -score)[, head(.sd, 2), = name] }
the benchmarking.
library(microbenchmark) microbenchmark(fun1(df1), fun2(df1), fun3(df1), fun1(df2), fun2(df2), fun3(df2), times = 20) # unit: milliseconds # expr min lq mean median uq max neval # fun1(df1) 502.76809 513.98317 569.47883 597.90488 603.34458 686.4302 20 # fun2(df1) 733.12544 741.18777 796.67106 822.60824 828.88449 839.3837 20 # fun3(df1) 87.80581 93.07012 95.34281 95.56374 97.49608 101.7991 20 # fun1(df2) 672.60241 764.10237 764.60365 772.33959 780.14679 799.3505 20 # fun2(df2) 6338.14881 6360.42621 6407.66675 6412.99278 6451.75626 6479.2681 20 # fun3(df2) 354.24119 366.47396 382.58666 369.78597 374.01897 468.9197 20
r
No comments:
Post a Comment