Sunday, 15 June 2014

algorithm - Find the most hit url in a large file -



algorithm - Find the most hit url in a large file -

i reading this yelp interview on glassdoor

"we have big log file, 5gb. each line of log file contains url user has visited on our site. want figure out what's popular 100 urls visited our users. "

and 1 of solution

cat log | sort | uniq -c | sort -k2n | head 100

can explain me purpose of sec sort (sort -k2n)?

thanks!

it looks stages are:

1) log file filter

2) identical filenames together

3) count number of occurrences of each different filename

4) sort pairs (filename, number of occurrences) number of occurrences

5) print out 100 more mutual filenames

algorithm sorting unix

No comments:

Post a Comment