R. Import many .csv files at once, while skipping first 7 lines and giving new column identifier info from first 7 lines -
i trying import 9000 .csv files r create 1 master file , able them much more efficiently than
read.csv(file="filename',header=true, sep="\t")
furthermore want skip first 7 lines in each .csv contain info .csv file not before retrieve info lines , add together them new columns in info file can identify each subsequent file later on.
ive used skip=7
alternative when importing individual .csv's before no issue haven't been able import multiple files @ 1 time allow lone taking info first 7 lines first.
i've tried reading in many .csv files 1 folder using next code
temp = list.files(pattern="*.csv") myfiles = lapply(temp, read.delim)
every .csv takes next format
program 5.5.3 "rawfilename=""c:\....""" from=0:00.0, to=3:32:13.7 date=24may2014 athlete=john smith eventdescription=round 10 v team b time var1 var2 var3 var4 var5 0:00 0 0 0 0 0 0:01 1 1 4 0 0
and want code create them
time var1 var2 var3 var4 var5 date athlete event description 0:00.0 0 0 0 0 0 0:00.0 3:32:13.7 24may2014 john smith round 10 v team b 0:00.1 1 1 4 0 0 0:00.0 3:32:13.7 24may2014 john smith round 10 v team b
the next athlete added below folowing same format , on
has else had similar thing they've wanted accomplish , if how did it?
this much brute-force method since didn't utilize clever regex or anything, if files constructed in way, next might work:
i used readlines
input looking this:
# [[1]] # [1] "program 5.5.3" # # [[2]] # [1] "\"rawfilename" "\"\"c:\\....\"\"\"" # # [[3]] # [1] "from" "0:00.0" "to" "3:32:13.7" # # [[4]] # [1] "date" "24may2014" # # [[5]] # [1] "athlete" "john smith" # # [[6]] # [1] "eventdescription" "round 10 v team b" # # [[7]] # [1] "time var1 var2 var3 var4 var5" # # [[8]] # [1] "0:00 0 0 0 0 0" # # [[9]] # [1] "0:01 1 1 4 0 0"
and made simple function process info selecting proper list items , elements:
f <- function(filepath) { dat <- readlines(con <- file(filepath), warn = false) close(con) x <- strsplit(dat, ', |=') res <- read.table(text = do.call(rbind, x[7:9]), header = true, stringsasfactors = false) res <- within(res, { 'event description' <- x[[6]][2] athlete <- x[[5]][2] date <- x[[4]][2] <- x[[3]][4] <- x[[3]][2] }) return(res) }
so give file name , this
f('~/desktop/tmp.csv') # time var1 var2 var3 var4 var5 date athlete event description # 1 0:00 0 0 0 0 0 0:00.0 3:32:13.7 24may2014 john smith 1 round 10 v team b # 2 0:01 1 1 4 0 0 0:00.0 3:32:13.7 24may2014 john smith 2 round 10 v team b
and can repeat process files , merge them
## untested do.call(rbind.data.frame, map(f, all_file_paths))
r csv import lapply skip
No comments:
Post a Comment