python - Why is pandas DataFrame more expensive than numpy ndarray? -
i benchmarking pandas dataframe
creation , found more expensive numpy ndarray
creation.
benchmark code
from timeit import timer setup = """ import numpy np import pandas pd """ numpy_code = """ info = np.zeros(shape=(360,),dtype=[('a', 'f4'),('b', 'f4'),('c', 'f4')]) """ pandas_code =""" df =pd.dataframe(np.zeros(shape=(360,),dtype=[('a', 'f4'),('b', 'f4'),('c', 'f4')])) """ print "numpy",min(timer(numpy_code,setup=setup).repeat(10,10))*10**6,"micro-seconds" print "pandas",min(timer(pandas_code,setup=setup).repeat(10,10))*10**6,"micro-seconds"
the output
numpy 17.5073728315 micro-seconds pandas 1757.9817013 micro-seconds
i wondering if help me understand why pandas dataframe
creation more expensive ndarray
construction. , if doing wrong, can please help me improve performance.
system details
pandas version: 0.12.0 numpy version: 1.9.0 python 2.7.6 (32-bit) running on windows 7
for homogeneous dtyped numpy array, performance difference creations quite miniscule , no copying done, , array passed thru.
however heteregenous dtyped numpy arrays, info segregated dtype (which may involve copying, esp if input has non-contiguous dtypes) separate blocks each holding single dtype (as numpy array).
other types of info trigger different amounts of checks (e.g. lists scrutinized if 1-d, 2-d etc), , various checks relating coercions of datetime-likes occur.
the reasons upfront dtype separation simple. can perform operations operate differently on different dtypes without run-time separation (and correspondent slicing performance issues).
to honest very-very slight perf nail take of attendent advantages of using dataframe, namely consistent intuitive api handles null-data , different dtypes intelligently.
homogeous case, involves no copying
in [41]: %timeit np.ones((10000,100)) 1000 loops, best of 3: 399 per loop in [42]: arr = np.ones((10000,100)) in [43]: %timeit dataframe(arr) 10000 loops, best of 3: 65.9 per loop
python numpy pandas
No comments:
Post a Comment