My Blog: python - Pandas Series : faster way of computing periods back to preceding high -

Monday, 15 February 2010

python - Pandas Series : faster way of computing periods back to preceding high -

i have timeseries of cost info stored open, high, low, close values in dataframe

i want create new column in each element records count of how many days need find high higher in source array.

so series this

    import pandas pd     import numpy np     my_vals = pd.series([10.1, 9.0, 2.4, 8.2, 7.0, 6.1, 5.4, 9.4, 8.7, 11.8, 3.5, 4.7, 5.4, 6.4, 7.8, 8.0, 9.1, 10.2, 11.0, 2.0])

we these values [nan, 1, 1, 2, 1, 1, 7, 1, nan, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1]

i wrote code using rolling_apply, works, really slow , i'm convinced there's far improve way this.

def countdayssincehigherhigh(x):     aaa = pd.series(x)      zzz = x[-1] #looking values higher zzz     bbb = aaa[:-1:] #array without  lastly element     ccc = bbb[bbb>zzz]  #boolean array elements higher zzz      ddd = ccc.last_valid_index()     if ddd == none:          homecoming np.nan #or  homecoming 10000 match window length     else:          homecoming aaa.last_valid_index() - ddd

and compute new column do

new_col = pd.rolling_apply(my_vals, 10000, countdayssincehigherhigh, min_periods = 0 )

any advice appreciated :)

you can 2 loop, worst time complexity maybe o(n**2). here method can in o(n*log(n)):

the algorithm:

argsort() array index array for every element in index @ idx, find largest element in index after idx, largest 1 less idx. quickly, can utilize sortedlist. here 2 library implement sorted list:

http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html

http://stutzbachenterprises.com/blist/sortedlist.html

here code:

import numpy np sortedcontainers import sortedlist  def nearest_hi_value(my_vals):     index = np.argsort(my_vals)     sl = sortedlist(range(len(index)), load=100)     res = []     idx in index.tolist():         sl.remove(idx)         idx2 = sl.bisect_left(idx)         if idx2 > 0:             res.append(idx - sl[idx2-1])         else:             res.append(0)     result = np.zeros_like(index)     result[index] = res      homecoming result

if 2 continuous elements in array same, nearest_hi_value() may homecoming 1, can fixed easily.

here result check:

my_vals = np.random.rand(1000) res1 = pd.rolling_apply(my_vals, 10000, countdayssincehigherhigh, min_periods = 0 ) res2 = nearest_hi_value(my_vals) np.allclose(res1, res2)

here timeit result:

%timeit pd.rolling_apply(my_vals, 10000, countdayssincehigherhigh, min_periods = 0 ) %timeit nearest_hi_value(my_vals)

output:

1 loops, best of 3: 489 ms per loop 100 loops, best of 3: 10.4 ms per loop

python numpy pandas

My Blog

Monday, 15 February 2010

python - Pandas Series : faster way of computing periods back to preceding high -

No comments:

Post a Comment