r - selecting text from a middle of a textfile with known line numbers -
i wrote r code run analysis on research project. coded in such way there output text file status of program. header of output file looks this:
start time: 2014-10-23 19:15:04 starting analysis on state model: 16 current correlation state: 1 >>>em_prod_combs em_prod_combs h3k18ac_h3k4me1 1.040493e-50 h3k18ac_h3k4me2 3.208806e-77 h3k18ac_h3k4me3 0.0001307375 h3k18ac_h3k9ac 0.001904384
the `>>>em_prod_combs" on line 4. line 5 repeated 1 time again (r code). i'd info 6. info goes on 36 more rows ends @ line 42. there other text in file until way 742 looks this:
(742) >>>em_prod_combs (743) em_actual_perc (744) h3k18ac_h3k4me1 0 h3k18ac_h3k4me2 0 h3k18ac_h3k4me3 0.0001976819 h3k18ac_h3k9ac 0.001690382
and 1 time again i'd select info line 744 (actual data, not headers) , go 36 rows , end @ line 780. here part of code:
filepath <- paste(folder_directory, corr_folders[fi], filename, sep="" ) con <- file(filepath) open(con); results.list <- list(); current.line <- 0 while (length(line <- readlines(con, n = 1, warn = false)) > 0) { if(line==">>>em_prod_combs"){ storethenext <- true } } close(con)
here, trying see if line read had ">>>" mark. if so, set variable true , store next 36 lines (using counter variable) in info frame or list , set storethenext
variable f
. kind of hoping there improve way of doing this....
so realized readlines has parameter can set skipping lines. based on that, got this:
df <- data.frame(name = character, params = numeric(40), stringsasfactors = false) con <- file(filepath) open(con); results.list <- list(); current.line <- 0 firstblock <- readlines(con, n = 5, warn = false) firstblock <- null #throwaway firstblock <- readlines(con, n = 36, warn = false) firstblock <- as.list(firstblock) #convert list for(k in 1:36){ splitstring = strsplit(firstblock[[k]], " ", fixed=true) ## set info in df }
but turns out ben's reply read.table can same thing in 1 line: i've reduced downwards next 1 liner:
firstblock2 <- read.table(filepath, header = false, sep = " ", skip = 5, nrows = 36)
this makes info frame impliticitly , dirty work me. documentation read.table here: https://stat.ethz.ch/r-manual/r-devel/library/utils/html/read.table.html
r
No comments:
Post a Comment