Thursday, 15 May 2014

awk - How to track lines in large log file that don't appear in the expected order? -



awk - How to track lines in large log file that don't appear in the expected order? -

i have big log file includes lines in format

id_number message_type

here illustration log file lines appear in expected order

1 2 1 b 1 c 2 b 2 c

however, not lines appear in expected order in log file , i'd list of id numbers don't appear in expected order. next file

1 2 1 c 1 b 2 b 2 c

i output indicates id number 1 has lines don't appear in expected order. how this, using grep, sed , awk?

this works me:

awk -v "a=abc" 'substr(a, b[$1]++ + 1, 1) != $2 {print $1}' logfile

when run this, id number each out-of-order line printed. if there no out-of-order lines, nil printed.

how works

-v "a=abc"

this defines variable a list of characters in expected order.

substr(a, b[$1]++ + 1, 1) != $2 {print $1}

for each id number, array b keeps track of are. initially, b 0 ids. initial value, b[$1]==0, look substr(a, b[$1] + 1, 1) returns a our first expected output. status substr(a, b[$1] + 1, 1) != $2 checks if expected output, substr function, differs actual output shown in sec field, $2. if differ, id value, $1, printed.

after substr look computed, trailing ++ in look b[$1]++ increments value of b[$1] 1 value of b[$1] ready next time id $1 encountered.

refinement

the above prints id number every time out-of-order line encountered. if want each bad id printed once, not multiple times, use:

awk -v "a=abc" 'substr(a, b[$1]++ + 1, 1) != $2 {bad[$1]++} end{for (n in bad) print n}' logfile

awk sed grep

No comments:

Post a Comment