dataframes - Pandas: Joining information from multiple data frames, array -
suppose have 3 info structures:
a info framedf1
, columns a, b, c
of length 10000 a info frame df2
, columns a, misc. columns...
of length 8000 a python list labels
of length 8000, element @ index i
corresponds row i
in df2
. i'm trying create info frame info that, every element in df2.a
, grab relevant row df1
, labels
pair information. it's possible entry in df2.a
not nowadays in df1.a
.
currently, i'm doing through for in xrange(len(df2))
loop, checking if df2.a.iloc[i]
nowadays in df1.a
, , if is, store df1.a, df1.b, df1.c, labels[i]
dictionary first element key , rest of elements list.
is there more efficient way , store outputs df1.a, df1.b, df1.c, labels[i]
4 columns dataframe? loop slow.
sample data:
df1 b c 'uid1' 'bob' 'rock' 'uid2' 'jack' 'pop' 'uid5' 'cat' 'country' ... df2 'uid10' 'uid3' 'uid1' ... labels [label10, label3, label1, ...]
ok understand next should work:
class="lang-python prettyprint-override"># create new column labels, align index df2['labels'] = labels # merge rows df1 on column 'a' df2 = df2.merge(df1, on='a', how='left')
example:
class="lang-python prettyprint-override"># setup sample info temp="""a b c 'uid1' 'bob' 'rock' 'uid2' 'jack' 'pop' 'uid5' 'cat' 'country'""" temp1="""a 'uid10' 'uid3' 'uid1'""" labels = ['label10', 'label3', 'label1'] df1 = pd.read_csv(io.stringio(temp), sep='\s+') df2 = pd.read_csv(io.stringio(temp1)) in [97]: # work df2['labels'] = labels df2 = df2.merge(df1, on='a', how='left') df2 out[97]: labels b c 0 'uid10' label10 nan nan 1 'uid3' label3 nan nan 2 'uid1' label1 'bob' 'rock'
this considerably faster looping
pandas dataframes
No comments:
Post a Comment