Friday, 15 June 2012

python - Use numpy.average with weights for resampling a pandas array -



python - Use numpy.average with weights for resampling a pandas array -

i need resample info numpys weighted-average-function - , doesn't work... .

this test-case:

import numpy np import pandas pd time_vec = [datetime.datetime(2007,1,1,0,0) ,datetime.datetime(2007,1,1,0,1) ,datetime.datetime(2007,1,1,0,5) ,datetime.datetime(2007,1,1,0,8) ,datetime.datetime(2007,1,1,0,10) ] df = pd.dataframe([2,3,1,7,4],index = time_vec)

a normal resampling without weights works fine (using lambda function parameter how suggested here: pandas resampling using numpy percentile? thanks!):

df.resample('5min',how = lambda x: np.average(x[0]))

but if seek utilize weights, returns typeerror: axis must specified when shapes of , weights differ:

df.resample('5min',how = lambda x: np.average(x[0],weights = [1,2,3,4,5]))

i tried many different numbers of weights, did not better:

for in xrange(20): try: print range(i) print df.resample('5min',how = lambda x:np.average(x[0],weights = range(i))) print break except typeerror: print i,'typeerror'

i'd glad suggestions.

the short reply here weights in lambda need created dynamically based on length of series beingness averaged. in addition, need careful types of objects you're manipulating.

the code got compute think you're trying follows:

df.resample('5min', how=lambda x: np.average(x, weights=1+np.arange(len(x))))

there 2 differences compared line giving problems:

x[0] x. x object in lambda pd.series, , x[0] gives first value in series. working without raising exception in first illustration (without weights) because np.average(c) returns c when c scalar. think computing wrong averages in case, because each of sampled subsets returning first value "average".

the weights created dynamically based on length of info in series beingness resampled. need because x in lambda might series of different length each time interval beingness computed.

the way figured out through simple type debugging, replacing lambda proper function definition:

def avg(x): print(type(x), x.shape, type(x[0])) homecoming np.average(x, weights=np.arange(1, 1+len(x))) df.resample('5min', how=avg)

this allow me have @ happening x variable. hope helps!

python numpy pandas weighted-average

No comments:

Post a Comment