Wednesday, 15 May 2013

R. Import many .csv files at once, while skipping first 7 lines and giving new column identifier info from first 7 lines -



R. Import many .csv files at once, while skipping first 7 lines and giving new column identifier info from first 7 lines -

i trying import 9000 .csv files r create 1 master file , able them much more efficiently than

read.csv(file="filename',header=true, sep="\t")

furthermore want skip first 7 lines in each .csv contain info .csv file not before retrieve info lines , add together them new columns in info file can identify each subsequent file later on.

ive used skip=7 alternative when importing individual .csv's before no issue haven't been able import multiple files @ 1 time allow lone taking info first 7 lines first.

i've tried reading in many .csv files 1 folder using next code

temp = list.files(pattern="*.csv") myfiles = lapply(temp, read.delim)

every .csv takes next format

program 5.5.3 "rawfilename=""c:\....""" from=0:00.0, to=3:32:13.7 date=24may2014 athlete=john smith eventdescription=round 10 v team b time var1 var2 var3 var4 var5 0:00 0 0 0 0 0 0:01 1 1 4 0 0

and want code create them

time var1 var2 var3 var4 var5 date athlete event description 0:00.0 0 0 0 0 0 0:00.0 3:32:13.7 24may2014 john smith round 10 v team b 0:00.1 1 1 4 0 0 0:00.0 3:32:13.7 24may2014 john smith round 10 v team b

the next athlete added below folowing same format , on

has else had similar thing they've wanted accomplish , if how did it?

this much brute-force method since didn't utilize clever regex or anything, if files constructed in way, next might work:

i used readlines input looking this:

# [[1]] # [1] "program 5.5.3" # # [[2]] # [1] "\"rawfilename" "\"\"c:\\....\"\"\"" # # [[3]] # [1] "from" "0:00.0" "to" "3:32:13.7" # # [[4]] # [1] "date" "24may2014" # # [[5]] # [1] "athlete" "john smith" # # [[6]] # [1] "eventdescription" "round 10 v team b" # # [[7]] # [1] "time var1 var2 var3 var4 var5" # # [[8]] # [1] "0:00 0 0 0 0 0" # # [[9]] # [1] "0:01 1 1 4 0 0"

and made simple function process info selecting proper list items , elements:

f <- function(filepath) { dat <- readlines(con <- file(filepath), warn = false) close(con) x <- strsplit(dat, ', |=') res <- read.table(text = do.call(rbind, x[7:9]), header = true, stringsasfactors = false) res <- within(res, { 'event description' <- x[[6]][2] athlete <- x[[5]][2] date <- x[[4]][2] <- x[[3]][4] <- x[[3]][2] }) return(res) }

so give file name , this

f('~/desktop/tmp.csv') # time var1 var2 var3 var4 var5 date athlete event description # 1 0:00 0 0 0 0 0 0:00.0 3:32:13.7 24may2014 john smith 1 round 10 v team b # 2 0:01 1 1 4 0 0 0:00.0 3:32:13.7 24may2014 john smith 2 round 10 v team b

and can repeat process files , merge them

## untested do.call(rbind.data.frame, map(f, all_file_paths))

r csv import lapply skip

No comments:

Post a Comment