Saturday, 15 March 2014

r - Using separate from tidyr with different length vectors -



r - Using separate from tidyr with different length vectors -

i separate column of strings such [1, 58, 10] columns using separate tidyr. problem columns shorter (never longer). have many columns issue in same info frame.

loading packages

require(tidyr) require(dplyr) require(stringr)

the data

here create info frame samples real data. "vectors" of length 10 in col1 , 9 or 10 in col2. there time column show there other columns well.

df <- data.frame( time = as.posixct(1:5, origin=sys.time()), col1 = c("[0,355,0,0,0,1227,0,0,382059,116]", "[0,31,0,0,0,5,0,0,925,1]", "[0,1,0,0,0,471,0,0,130339,3946]", "[0,0,0,0,0,223,0,0,37666,12]", "[0,19,0,0,0,667,0,0,336956,53]"), col2 = c("[0,355,0,0,0,1227,0,0,382059,116]", "[0,355,0,0,0,1227,0,0,382059,116]", "[0,0,0,0,0,223,0,0,37666,12]", "[0,19,0,0,0,667,0,0,336956]","[0,355,0,0,0,1227,0,0,382059,116]") )

how want be

for first column "vectors" of equal length can utilize separate() want.

a1 <- df %>% mutate(col1 = str_sub(col1,2,-2)) %>% separate(col1, paste("col1",1:10,sep="."),",") # making sure numbers numeric a1 <- as.data.frame(sapply(a1, as.numeric)) %>% mutate(time = as.posixct(time, origin="1970-01-01")) %>% select(-col2)

this results in

> a1 time col1.1 col1.2 col1.3 col1.4 col1.5 col1.6 col1.7 col1.8 1 2014-11-07 12:21:45 0 355 0 0 0 1227 0 0 2 2014-11-07 12:21:46 0 31 0 0 0 5 0 0 3 2014-11-07 12:21:47 0 1 0 0 0 471 0 0 4 2014-11-07 12:21:48 0 0 0 0 0 223 0 0 5 2014-11-07 12:21:49 0 19 0 0 0 667 0 0 col1.9 col1.10 1 382059 116 2 925 1 3 130339 3946 4 37666 12 5 336956 53

this not work col2 elements can't split several columns

workaround

# not work #b1 <- df %>% # mutate(col2 = str_sub(col1,2,-2)) %>% # separate(col2, paste("col2",1:10,sep="."),",") b2 <- sapply(as.data.frame(str_split_fixed(str_sub(df$col2,2,-2),',',n=10), stringsasfactors=f), as.numeric) colnames(b2) <- paste("col2",1:10,sep=".") b2 <- as.data.frame(cbind(time=df$time, b2)) %>% mutate(time = as.posixct(time, origin="1970-01-01"))

which results in

> b2 time col2.1 col2.2 col2.3 col2.4 col2.5 col2.6 col2.7 col2.8 1 2014-11-07 12:21:45 0 355 0 0 0 1227 0 0 2 2014-11-07 12:21:46 0 355 0 0 0 1227 0 0 3 2014-11-07 12:21:47 0 0 0 0 0 223 0 0 4 2014-11-07 12:21:48 0 19 0 0 0 667 0 0 5 2014-11-07 12:21:49 0 355 0 0 0 1227 0 0 col2.9 col2.10 1 382059 116 2 382059 116 3 37666 12 4 336956 na 5 382059 116

if vector shorter, lastly elements shall na, correct.

the questions

is there way utilize separate (or other simpler function) instead of workaround? there way apply col1 , col2 @ same time (by selecting columns starts col example)?

thanks!

this answers first part of question separate. there extra argument in separate (at to the lowest degree in development version of tidyr) allow want if set extra "merge".

df %>% mutate(col2 = str_sub(col2,2,-2)) %>% separate(col2, paste("col2",1:10,sep="."), ",", = "merge") time col1 1 2014-11-07 08:00:59 [0,355,0,0,0,1227,0,0,382059,116] 2 2014-11-07 08:01:00 [0,31,0,0,0,5,0,0,925,1] 3 2014-11-07 08:01:01 [0,1,0,0,0,471,0,0,130339,3946] 4 2014-11-07 08:01:02 [0,0,0,0,0,223,0,0,37666,12] 5 2014-11-07 08:01:03 [0,19,0,0,0,667,0,0,336956,53] col2.1 col2.2 col2.3 col2.4 col2.5 col2.6 col2.7 col2.8 1 0 355 0 0 0 1227 0 0 2 0 355 0 0 0 1227 0 0 3 0 0 0 0 0 223 0 0 4 0 19 0 0 0 667 0 0 5 0 355 0 0 0 1227 0 0 col2.9 col2.10 1 382059 116 2 382059 116 3 37666 12 4 336956 <na> 5 382059 116

r dplyr stringr tidyr

No comments:

Post a Comment