Tuesday, 15 June 2010

csv - Separating arrow separated values in data frame to separate unequal columns using R? -



csv - Separating arrow separated values in data frame to separate unequal columns using R? -

i have info frame next sample values.

[1] "entry.cei" [2] "entry.lifecycle->hist.open.personal demand chequing account->exit.lifecycle->entry.cei" [3] "entry.lifecycle->hist.open.personal demand savings account->exit.lifecycle->entry.cei" [4] "entry.transaction->txn.no source available->exit.transaction->entry.cei" [5] "entry.branch->exit.branch->entry.transaction->txn.in-branch->exit.transaction->entry.cei"

i need split them "->" set them in different columns, v1, v2 etc. example:

v1 v2 v3 v4 v5 v6 v7 1 entry.cei 2 entry.lifecycle hist.open.personal demand chequing business relationship exit.lifecycle entry.cei 3 entry.lifecycle hist.open.personal demand savings business relationship exit.lifecycle entry.cei

how can accomplish in r? tried used rbind strsplit() think requies equal number of columns.

the easiest way utilize gsub replace -> comma, utilize read.csv. if have commas in data, utilize > instead of comma , should fine.

read.csv(text = gsub("->", ",", x, fixed = true), header = false) # v1 v2 v3 v4 v5 v6 # 1 entry.cei # 2 entry.lifecycle hist.open.personal demand chequing business relationship exit.lifecycle entry.cei # 3 entry.lifecycle hist.open.personal demand savings business relationship exit.lifecycle entry.cei # 4 entry.transaction txn.no source available exit.transaction entry.cei # 5 entry.branch exit.branch entry.transaction txn.in-branch exit.transaction entry.cei

or alternatively

read.table(text = gsub("->", ",", x, fixed = true), sep = ",", fill = true)

you can still utilize rbind , strsplit long create list elements same length first. length<- replacement function can help that.

s <- strsplit(x, "->", fixed = true) data.frame(do.call(rbind, lapply(s, `length<-`, max(sapply(s, length))))) # x1 x2 x3 x4 x5 x6 # 1 entry.cei <na> <na> <na> <na> <na> # 2 entry.lifecycle hist.open.personal demand chequing business relationship exit.lifecycle entry.cei <na> <na> # 3 entry.lifecycle hist.open.personal demand savings business relationship exit.lifecycle entry.cei <na> <na> # 4 entry.transaction txn.no source available exit.transaction entry.cei <na> <na> # 5 entry.branch exit.branch entry.transaction txn.in-branch exit.transaction entry.cei

where original x vector is

x <- c("entry.cei", "entry.lifecycle->hist.open.personal demand chequing account->exit.lifecycle->entry.cei", "entry.lifecycle->hist.open.personal demand savings account->exit.lifecycle->entry.cei", "entry.transaction->txn.no source available->exit.transaction->entry.cei", "entry.branch->exit.branch->entry.transaction->txn.in-branch->exit.transaction->entry.cei")

r csv strsplit

No comments:

Post a Comment