Saturday, 15 February 2014

awk to parse input with multiple conditions -


In this post

:

awk in a continuation of

If I wanted to add a thread to pars and apologize, would I add it to that post? I have tried to modify the awk script below, but with no luck

  awk 'NR == 2 {partition ($ 2, a, "[ _. & Gt;] "); B = substr (a [4], the length of 1 (a [4] -1); A [2] + 0, B, B, substrate (a [4], length (one [4]), one [5]} '\ OFS = "\ t" $ {id} _position.txt> $ {Id} _parse.txt  

I have more than one possible situation that one user can produce a result as input, one of those situations is in the data sample, in bold Field is required to be parsed:

  Parsing the rules: 1. NC_ (not always the case) and before the number 4 before zero. 2. G ### (Underscore First) _ ### (# after _) 3. t CG (alphabet) after Dell - (hyphen used in this place) ` 

data sample

  input version errors coding chromosome version (S) NM_004004.5: c.575_576delCA ** NC_000013.10: g.20763145_20763146delTG ** NM_004004.5: c.575_576delCA XM_005266354.1: c. 575_576delCA XM_005266355.1: c.575_576delCA XM_005266356.1: c.575_576delCA < / Code> 

desired output

  13 20,763,145 20,763,146 TG -  

Thank you :).

TCR language:

  input version @ (skip) @ ( Skip) NC _ @ {@ nc-raw} @ @ (skip) g @ {G-left} _ @ {g-right} del @ {letters2} @ (skip) @ (@ {G-left 12} @ {g-right 12} @ {letter 6} - @ (end) < / B> Run NC-number @ (Int-StrNR-RC) / Code>  

:

  $ txr nc.txr data 13 20763145 20763146 TG -  

In all command line:

  $ txr -c 'Input @ @ (G-left) _ @ {g-right} del @ {letters 2} @ (Skip) @ (Bind NC) @ @ (skip) @ @ (nc-raw) @ @ -num @ (integer-str NC-Raw) @ @ (output) @ {NC-number 6} @ Given 12} @ {g right 12} @ {letter 6} - @ (end) 'data 13 20763145 20763146 Tg -  

No comments:

Post a Comment