My Blog: r - selecting text from a middle of a textfile with known line numbers -

Sunday, 15 January 2012

r - selecting text from a middle of a textfile with known line numbers -

i wrote r code run analysis on research project. coded in such way there output text file status of program. header of output file looks this:

start time: 2014-10-23 19:15:04  starting analysis on state model: 16  current correlation state: 1  >>>em_prod_combs  em_prod_combs  h3k18ac_h3k4me1 1.040493e-50  h3k18ac_h3k4me2 3.208806e-77  h3k18ac_h3k4me3 0.0001307375  h3k18ac_h3k9ac 0.001904384

the `>>>em_prod_combs" on line 4. line 5 repeated 1 time again (r code). i'd info 6. info goes on 36 more rows ends @ line 42. there other text in file until way 742 looks this:

 (742) >>>em_prod_combs   (743) em_actual_perc   (744) h3k18ac_h3k4me1 0  h3k18ac_h3k4me2 0  h3k18ac_h3k4me3 0.0001976819  h3k18ac_h3k9ac 0.001690382

and 1 time again i'd select info line 744 (actual data, not headers) , go 36 rows , end @ line 780. here part of code:

filepath <- paste(folder_directory, corr_folders[fi], filename, sep="" )      con <- file(filepath)       open(con);      results.list <- list();      current.line <- 0      while (length(line <- readlines(con, n = 1, warn = false)) > 0) {        if(line==">>>em_prod_combs"){          storethenext <- true        }      }       close(con)

here, trying see if line read had ">>>" mark. if so, set variable true , store next 36 lines (using counter variable) in info frame or list , set storethenext variable f. kind of hoping there improve way of doing this....

so realized readlines has parameter can set skipping lines. based on that, got this:

df <- data.frame(name = character,                   params = numeric(40),                                                 stringsasfactors = false)  con <- file(filepath)  open(con); results.list <- list(); current.line <- 0 firstblock <- readlines(con, n = 5, warn = false) firstblock <- null #throwaway firstblock <- readlines(con, n = 36, warn = false) firstblock <- as.list(firstblock) #convert list for(k in 1:36){   splitstring = strsplit(firstblock[[k]], " ", fixed=true)   ##  set   info in df }

but turns out ben's reply read.table can same thing in 1 line: i've reduced downwards next 1 liner:

firstblock2 <- read.table(filepath, header = false, sep = " ", skip = 5, nrows = 36)

this makes info frame impliticitly , dirty work me. documentation read.table here: https://stat.ethz.ch/r-manual/r-devel/library/utils/html/read.table.html

My Blog

Sunday, 15 January 2012

r - selecting text from a middle of a textfile with known line numbers -

No comments:

Post a Comment