r - Using separate from tidyr with different length vectors -
i separate column of strings such [1, 58, 10] columns using separate tidyr. problem columns shorter (never longer). have many columns issue in same info frame.
loading packages
require(tidyr) require(dplyr) require(stringr)
the data
here create info frame samples real data. "vectors" of length 10 in col1 , 9 or 10 in col2. there time column show there other columns well.
df <- data.frame( time = as.posixct(1:5, origin=sys.time()), col1 = c("[0,355,0,0,0,1227,0,0,382059,116]", "[0,31,0,0,0,5,0,0,925,1]", "[0,1,0,0,0,471,0,0,130339,3946]", "[0,0,0,0,0,223,0,0,37666,12]", "[0,19,0,0,0,667,0,0,336956,53]"), col2 = c("[0,355,0,0,0,1227,0,0,382059,116]", "[0,355,0,0,0,1227,0,0,382059,116]", "[0,0,0,0,0,223,0,0,37666,12]", "[0,19,0,0,0,667,0,0,336956]","[0,355,0,0,0,1227,0,0,382059,116]") )
how want be
for first column "vectors" of equal length can utilize separate() want.
a1 <- df %>% mutate(col1 = str_sub(col1,2,-2)) %>% separate(col1, paste("col1",1:10,sep="."),",") # making sure numbers numeric a1 <- as.data.frame(sapply(a1, as.numeric)) %>% mutate(time = as.posixct(time, origin="1970-01-01")) %>% select(-col2)
this results in
> a1 time col1.1 col1.2 col1.3 col1.4 col1.5 col1.6 col1.7 col1.8 1 2014-11-07 12:21:45 0 355 0 0 0 1227 0 0 2 2014-11-07 12:21:46 0 31 0 0 0 5 0 0 3 2014-11-07 12:21:47 0 1 0 0 0 471 0 0 4 2014-11-07 12:21:48 0 0 0 0 0 223 0 0 5 2014-11-07 12:21:49 0 19 0 0 0 667 0 0 col1.9 col1.10 1 382059 116 2 925 1 3 130339 3946 4 37666 12 5 336956 53
this not work col2 elements can't split several columns
workaround
# not work #b1 <- df %>% # mutate(col2 = str_sub(col1,2,-2)) %>% # separate(col2, paste("col2",1:10,sep="."),",") b2 <- sapply(as.data.frame(str_split_fixed(str_sub(df$col2,2,-2),',',n=10), stringsasfactors=f), as.numeric) colnames(b2) <- paste("col2",1:10,sep=".") b2 <- as.data.frame(cbind(time=df$time, b2)) %>% mutate(time = as.posixct(time, origin="1970-01-01"))
which results in
> b2 time col2.1 col2.2 col2.3 col2.4 col2.5 col2.6 col2.7 col2.8 1 2014-11-07 12:21:45 0 355 0 0 0 1227 0 0 2 2014-11-07 12:21:46 0 355 0 0 0 1227 0 0 3 2014-11-07 12:21:47 0 0 0 0 0 223 0 0 4 2014-11-07 12:21:48 0 19 0 0 0 667 0 0 5 2014-11-07 12:21:49 0 355 0 0 0 1227 0 0 col2.9 col2.10 1 382059 116 2 382059 116 3 37666 12 4 336956 na 5 382059 116
if vector shorter, lastly elements shall na, correct.
the questions
is there way utilize separate (or other simpler function) instead of workaround? there way apply col1 , col2 @ same time (by selecting columns starts col example)?
thanks!
this answers first part of question separate
. there extra
argument in separate
(at to the lowest degree in development version of tidyr) allow want if set extra
"merge"
.
df %>% mutate(col2 = str_sub(col2,2,-2)) %>% separate(col2, paste("col2",1:10,sep="."), ",", = "merge") time col1 1 2014-11-07 08:00:59 [0,355,0,0,0,1227,0,0,382059,116] 2 2014-11-07 08:01:00 [0,31,0,0,0,5,0,0,925,1] 3 2014-11-07 08:01:01 [0,1,0,0,0,471,0,0,130339,3946] 4 2014-11-07 08:01:02 [0,0,0,0,0,223,0,0,37666,12] 5 2014-11-07 08:01:03 [0,19,0,0,0,667,0,0,336956,53] col2.1 col2.2 col2.3 col2.4 col2.5 col2.6 col2.7 col2.8 1 0 355 0 0 0 1227 0 0 2 0 355 0 0 0 1227 0 0 3 0 0 0 0 0 223 0 0 4 0 19 0 0 0 667 0 0 5 0 355 0 0 0 1227 0 0 col2.9 col2.10 1 382059 116 2 382059 116 3 37666 12 4 336956 <na> 5 382059 116
r dplyr stringr tidyr
No comments:
Post a Comment