python - Extract top 20 (descending) rows of a CSV file with respect to a column -
i have csv file 3 columns looks this:
a,b,c 1,1,2 1,3,5 1,5,7 . . 2,3,4 2,1,5 2,4,7 i'd output
a,b,c 1,5,7 1,3,5 1,1,2 . . 2,4,7 2,3,4 2,1,5 i.e., each element in column a, i'd have top 20 (20 highest 'b' values) rows only. please excuse poor explanation. i've tried far doesn't give me required output:
import csv import heapq itertools import islice csvout = open ("output.csv", "w") author = csv.writer(csvout, delimiter=',',quotechar='"', lineterminator='\n', quoting=csv.quote_minimal) freqs = {} open('input.csv') fin: csvin = csv.reader(fin) rows_with_mut = ([float(row[1])] + row row in islice(csvin, 1, none) if row[2]) row in rows_with_mut: cnt = freqs.setdefault(row[0], [[]] * 20) heapq.heappushpop(cnt, row) assay_id, vals in freqs.iteritems(): output = [row[1:] row in sorted(filter(none, vals), reverse=true)] writer.writerows(output)
on risk of downvoting, utilize simple bash script:
#!/bin/bash all=$(cat) #read stdin echo "$all" | head -n 1 #echo header of file allt=$(echo "$all" | tail -n +2) #remove header memory avl=$(echo "$allt" | cutting -d ',' -f 1 | sort | uniq) #find unique values in column av in $avl #iterate on these values echo "$allt" | grep "^$av," | sort -t$',' -k2nr | head -n 20 #for each value, find lines value , sort them, homecoming top 20... done you can run in command line with:
bash script.sh < data.csv it print result on terminal...
example:
if 1 uses sample values (without "dot"-rows), 1 obtains:
user@machine ~> bash script.sh < data.csv a,b,c 1,5,7 1,3,5 1,1,2 2,4,7 2,3,4 2,1,5 if want write result file (say data2.csv) use:
bash script.sh < data.csv > data2.csv don't read , write same file: don't run bash script.sh < data.csv > data.csv.
python sorting csv highest
No comments:
Post a Comment