My Blog: R. Import many .csv files at once, while skipping first 7 lines and giving new column identifier info from first 7 lines -

Wednesday, 15 May 2013

R. Import many .csv files at once, while skipping first 7 lines and giving new column identifier info from first 7 lines -

i trying import 9000 .csv files r create 1 master file , able them much more efficiently than

read.csv(file="filename',header=true, sep="\t")

furthermore want skip first 7 lines in each .csv contain info .csv file not before retrieve info lines , add together them new columns in info file can identify each subsequent file later on.

ive used skip=7 alternative when importing individual .csv's before no issue haven't been able import multiple files @ 1 time allow lone taking info first 7 lines first.

i've tried reading in many .csv files 1 folder using next code

temp = list.files(pattern="*.csv") myfiles = lapply(temp, read.delim)

every .csv takes next format

program 5.5.3 "rawfilename=""c:\....""" from=0:00.0, to=3:32:13.7 date=24may2014 athlete=john smith eventdescription=round 10 v team b time var1 var2 var3 var4 var5 0:00  0    0    0    0    0 0:01  1    1    4    0    0

and want code create them

time   var1 var2 var3 var4 var5            date       athlete     event description 0:00.0  0    0    0    0    0   0:00.0  3:32:13.7  24may2014  john smith  round 10 v team b 0:00.1  1    1    4    0    0   0:00.0  3:32:13.7  24may2014  john smith  round 10 v team b

the next athlete added below folowing same format , on

has else had similar thing they've wanted accomplish , if how did it?

this much brute-force method since didn't utilize clever regex or anything, if files constructed in way, next might work:

i used readlines input looking this:

# [[1]] # [1] "program 5.5.3" #  # [[2]] # [1] "\"rawfilename"      "\"\"c:\\....\"\"\"" #  # [[3]] # [1] "from"      "0:00.0"    "to"        "3:32:13.7" #  # [[4]] # [1] "date"      "24may2014" #  # [[5]] # [1] "athlete"    "john smith" #  # [[6]] # [1] "eventdescription"  "round 10 v team b" #  # [[7]] # [1] "time var1 var2 var3 var4 var5" #  # [[8]] # [1] "0:00  0    0    0    0    0" #  # [[9]] # [1] "0:01  1    1    4    0    0"

and made simple function process info selecting proper list items , elements:

f <- function(filepath) {    dat <- readlines(con <- file(filepath), warn = false)   close(con)   x <- strsplit(dat, ', |=')    res <- read.table(text = do.call(rbind, x[7:9]), header = true,                      stringsasfactors = false)   res <- within(res, {     'event description' <- x[[6]][2]     athlete <- x[[5]][2]     date <- x[[4]][2]     <- x[[3]][4]     <- x[[3]][2]   })   return(res) }

so give file name , this

f('~/desktop/tmp.csv')  # time var1 var2 var3 var4 var5               date    athlete   event description # 1 0:00    0    0    0    0    0 0:00.0 3:32:13.7 24may2014 john smith 1 round 10 v team b # 2 0:01    1    1    4    0    0 0:00.0 3:32:13.7 24may2014 john smith 2 round 10 v team b

and can repeat process files , merge them

## untested do.call(rbind.data.frame, map(f, all_file_paths))

r csv import lapply skip

My Blog

Wednesday, 15 May 2013

R. Import many .csv files at once, while skipping first 7 lines and giving new column identifier info from first 7 lines -

No comments:

Post a Comment