awk - Extracting string pattern -
i have file looks this:
chr1 156706559 rs8658 c,g 370.29 pass ac=1,1;af=0.500,0.500;an=2;db;dp=19;dels=0.00;fs=0.000;haplotypescore=0.0000;mleac=1,1;mleaf=0.500,0.500;mq=56.74;mq0=0;positive_train_site;qd=19.49;vqslod=6.27;culprit=fs;eff=3_prime_utr_variant(modifier||123|c.*123a>c|rrnad1|protein_coding|coding|nm_001142560.1|7) gt:ad:dp:gq:pl 1/2:0,7,12:19:99:503,293,272,210,0,183 chr10 22839463 rs10047326 c a,t 202.29 pass ac=1,1;af=0.500,0.500;an=2;db;dp=10;dels=0.00;fs=0.000;haplotypescore=0.0000;mleac=1,1;mleaf=0.500,0.500;mq=60.00;mq0=0;positive_train_site;qd=20.23;vqslod=10.48;culprit=fs;eff=intron_variant(modifier|||c.792+125g>t|pip4k2a|protein_coding|coding|nm_005028.4|7) gt:ad:dp:gq:pl 1/2:0,6,4:10:99:317,127,109,190,0,178 chr10 75673731 rs2227566 c g,t 735.29 pass ac=1,1;af=0.500,0.500;an=2;db;dp=33;dels=0.00;fs=0.000;haplotypescore=0.0000;mleac=1,1;mleaf=0.500,0.500;mq=55.90;mq0=0;qd=22.28;vqslod=6.01;culprit=fs;eff=splice_region_variant(low|||c.630c>g|plau|protein_coding|coding|nm_001145031.1|6) gt:ad:dp:gq:pl 1/2:0,8,25:33:99:913,734,710,179,0,110 chr12 54805753 rs1922254 g c,t 404.66 pass ac=1,1;af=0.500,0.500;an=2;db;dp=18;dels=0.00;fs=0.000;haplotypescore=0.0000;mleac=1,1;mleaf=0.500,0.500;mq=55.34;mq0=0;qd=22.48;vqslod=5.61;culprit=fs;eff=splice_region_variant(low|||c.219c>g|itga5|protein_coding|coding|nm_002205.2|1) gt:ad:dp:gq:pl 1/2:0,4,14:18:67:540,434,422,106,0,67 chr15 50150903 rs7497350 c a,t 3655.29 pass ac=1,1;af=0.500,0.500;an=2;db;dp=140;dels=0.00;fs=0.000;haplotypescore=1.8136;mleac=1,1;mleaf=0.500,0.500;mq=60.00;mq0=0;positive_train_site;qd=26.11;vqslod=10.96;culprit=fs;eff=3_prime_utr_variant(modifier||1488|c.*1488g>t|atp8b4|protein_coding|coding|nm_024837.3|28) gt:ad:dp:gq:pl 1/2:0,62,78:140:99:4121,2349,2187,1772,0,1553 chr16 11678403 rs8054918 t c,g 283.29 pass ac=1,1;af=0.500,0.500;an=2;db;dp=18;dels=0.00;fs=0.000;haplotypescore=0.0000;mleac=1,1;mleaf=0.500,0.500;mq=60.00;mq0=0;qd=15.74;vqslod=10.55;culprit=fs;eff=intron_variant(modifier|||c.-6+1599a>g|litaf|protein_coding|coding|nm_004862.3|1) gt:ad:dp:gq:pl 1/2:0,9,9:18:99:407,181,160,226,0,208 chr16 78503259 rs2738676 g a,c 166.31 pass ac=1,1;af=0.500,0.500;an=2;db;dp=9;dels=0.00;fs=0.000;haplotypescore=0.0000;mleac=1,1;mleaf=0.500,0.500;mq=60.00;mq0=0;qd=18.48;vqslod=10.91;culprit=qd;eff=intron_variant(modifier|||c.717+36610g>a|wwox|protein_coding|coding|nm_001291997.1|7) gt:ad:dp:gq:pl 1/2:0,3,6:9:80:279,181,172,98,0,80 chr17 4205297 rs1866174 c a,t 189.29 pass ac=1,1;af=0.500,0.500;an=2;db;dp=12;dels=0.00;fs=0.000;haplotypescore=0.0000;mleac=1,1;mleaf=0.500,0.500;mq=47.61;mq0=0;positive_train_site;qd=15.77;vqslod=3.80;culprit=mq;eff=intron_variant(modifier|||c.149+5019g>t|ube2g1|protein_coding|coding|nm_003342.4|2) gt:ad:dp:gq:pl 1/2:0,5,7:12:87:307,202,187,105,0,87
would extract "eff= ....." each line, in above illustration output desired is
eff=3_prime_utr_variant eff=intron_variant eff=splice_region_variant
the above output first 3 lines.
what have tried.
grep -no 'eff="[^"]*"' file.txt
it doesn't work.
kindly help
grep -o "eff=\w*" /root/testso
gives:
eff=3_prime_utr_variant eff=intron_variant eff=splice_region_variant eff=splice_region_variant eff=3_prime_utr_variant eff=intron_variant eff=intron_variant eff=intron_variant
-o print matching part in output eff=\w*
regex telling match literraly eff=
followed word character (a toz z 0 9 or _) represented \w
, *
means lastly class (\w
) repeated 0 or more time
note after editing: first reply used -n
command line alternative add together number on output, fixing op command , not taking in business relationship whole question. @chrismae pointing out.
awk sed grep
No comments:
Post a Comment