Python conditional filtering in csv file -
please help! have tried different things/packages writing programme takes in 4 inputs , returns writing score statistics of grouping based on combination of inputs csv file. first project, appreciate insights/hints/tips!
here csv sample (has 200 rows total):
id gender ses schtyp prog write 70 male low public general 52 121 female middle public vocation 68 86 male high public general 33 141 male high public vocation 63 172 male middle public academic 47 113 male middle public academic 44 50 male middle public general 59 11 male middle public academic 34 84 male middle public general 57 48 male middle public academic 57 75 male middle public vocation 60 60 male middle public academic 57
here have far:
import csv import numpy csv_file_object=csv.reader(open('scores.csv', 'ru')) #reads file header=csv_file_object.next() #skips header data=[] #loads info array processing row in csv_file_object: data.append(row) data=numpy.array(data) #asks inputs gender=raw_input('enter gender [male/female]: ') schtyp=raw_input('enter school type [public/private]: ') ses=raw_input('enter socioeconomic status [low/middle/high]: ') prog=raw_input('enter programme status [general/vocation/academic: ') #makes them lower case , strings prog=str(prog.lower()) gender=str(gender.lower()) schtyp=str(schtyp.lower()) ses=str(ses.lower())
what missing how filter , gets stats specific group. example, input male, public, middle, , academic -- i'd want average writing score subset. tried groupby function pandas, gets stats broad groups (such public vs private). tried dataframe pandas, gets me filtering 1 input , not sure how writing scores. hints appreciated!
agreeing ramon, pandas way go, , has extraordinary filtering/sub-setting capability 1 time used it. can tough first wrap head around (or @ to the lowest degree me!), dug examples of sub-setting need of old code. variable itu
below pandas dataframe info on various countries on time.
# subsetting using true/false: subset = itu['cntryname'] == 'albania' # returns true/false values itu[subset] # returns 1x144 dataframe of info republic of albania itu[itu['cntryname'] == 'albania'] # one-line command, equivalent above 2 lines # pandas has many built-in functions .isin() provide params filter on itu[itu.cntrycode.isin(['usa','fra'])] # returns itu['cntrycode'] 'usa' or 'fra' itu[itu.year.isin([2000,2001,2002])] # returns of itu years 2000-2002 # advanced subsetting can include logical operations: itu[itu.cntrycode.isin(['usa','fra']) & itu.year.isin([2000,2001,2002])] # both of above @ same time # utilize .loc 2 elements simultaneously select row/index & column: itu.loc['usa','cntryname'] itu.iloc[204,0] itu.loc[['usa','bhs'], ['cntryname', 'year']] itu.iloc[[204, 13], [0, 1]] # can many operations @ once, reduces "readability" of code itu[itu.cntrycode.isin(['usa','fra']) & itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']] # finally, if you're comfortable using map() , list comprehensions, can advanced subsetting includes evaluations & functions determine elements want select whole, such countries name begins "united": criterion = itu['cntryname'].map(lambda x: x.startswith('united')) itu[criterion]['cntryname'] # gives uae, uk, &
python csv pandas
No comments:
Post a Comment