Wednesday, 15 April 2015

python - Extract top 20 (descending) rows of a CSV file with respect to a column -



python - Extract top 20 (descending) rows of a CSV file with respect to a column -

i have csv file 3 columns looks this:

a,b,c 1,1,2 1,3,5 1,5,7 . . 2,3,4 2,1,5 2,4,7

i'd output

a,b,c 1,5,7 1,3,5 1,1,2 . . 2,4,7 2,3,4 2,1,5

i.e., each element in column a, i'd have top 20 (20 highest 'b' values) rows only. please excuse poor explanation. i've tried far doesn't give me required output:

import csv import heapq itertools import islice csvout = open ("output.csv", "w") author = csv.writer(csvout, delimiter=',',quotechar='"', lineterminator='\n', quoting=csv.quote_minimal) freqs = {} open('input.csv') fin: csvin = csv.reader(fin) rows_with_mut = ([float(row[1])] + row row in islice(csvin, 1, none) if row[2]) row in rows_with_mut: cnt = freqs.setdefault(row[0], [[]] * 20) heapq.heappushpop(cnt, row) assay_id, vals in freqs.iteritems(): output = [row[1:] row in sorted(filter(none, vals), reverse=true)] writer.writerows(output)

on risk of downvoting, utilize simple bash script:

#!/bin/bash all=$(cat) #read stdin echo "$all" | head -n 1 #echo header of file allt=$(echo "$all" | tail -n +2) #remove header memory avl=$(echo "$allt" | cutting -d ',' -f 1 | sort | uniq) #find unique values in column av in $avl #iterate on these values echo "$allt" | grep "^$av," | sort -t$',' -k2nr | head -n 20 #for each value, find lines value , sort them, homecoming top 20... done

you can run in command line with:

bash script.sh < data.csv

it print result on terminal...

example:

if 1 uses sample values (without "dot"-rows), 1 obtains:

user@machine ~> bash script.sh < data.csv a,b,c 1,5,7 1,3,5 1,1,2 2,4,7 2,3,4 2,1,5

if want write result file (say data2.csv) use:

bash script.sh < data.csv > data2.csv

don't read , write same file: don't run bash script.sh < data.csv > data.csv.

python sorting csv highest

No comments:

Post a Comment