Wednesday, 15 January 2014

dataframes - Pandas: Joining information from multiple data frames, array -



dataframes - Pandas: Joining information from multiple data frames, array -

suppose have 3 info structures:

a info frame df1, columns a, b, c of length 10000 a info frame df2, columns a, misc. columns... of length 8000 a python list labels of length 8000, element @ index i corresponds row i in df2.

i'm trying create info frame info that, every element in df2.a, grab relevant row df1 , labels pair information. it's possible entry in df2.a not nowadays in df1.a.

currently, i'm doing through for in xrange(len(df2)) loop, checking if df2.a.iloc[i] nowadays in df1.a, , if is, store df1.a, df1.b, df1.c, labels[i] dictionary first element key , rest of elements list.

is there more efficient way , store outputs df1.a, df1.b, df1.c, labels[i] 4 columns dataframe? loop slow.

sample data:

df1 b c 'uid1' 'bob' 'rock' 'uid2' 'jack' 'pop' 'uid5' 'cat' 'country' ... df2 'uid10' 'uid3' 'uid1' ... labels [label10, label3, label1, ...]

ok understand next should work:

class="lang-python prettyprint-override"># create new column labels, align index df2['labels'] = labels # merge rows df1 on column 'a' df2 = df2.merge(df1, on='a', how='left')

example:

class="lang-python prettyprint-override"># setup sample info temp="""a b c 'uid1' 'bob' 'rock' 'uid2' 'jack' 'pop' 'uid5' 'cat' 'country'""" temp1="""a 'uid10' 'uid3' 'uid1'""" labels = ['label10', 'label3', 'label1'] df1 = pd.read_csv(io.stringio(temp), sep='\s+') df2 = pd.read_csv(io.stringio(temp1)) in [97]: # work df2['labels'] = labels df2 = df2.merge(df1, on='a', how='left') df2 out[97]: labels b c 0 'uid10' label10 nan nan 1 'uid3' label3 nan nan 2 'uid1' label1 'bob' 'rock'

this considerably faster looping

pandas dataframes

No comments:

Post a Comment