Monday, 15 March 2010

Explanation of the aggregate scala function -



Explanation of the aggregate scala function -

i not understand yet aggregate function:

for example, having:

val x = list(1,2,3,4,5,6) val y = x.par.aggregate((0, 0))((x, y) => (x._1 + y, x._2 + 1), (x,y) => (x._1 + y._1, x._2 + y._2))

the result be: (21,6)

well, think (x,y) => (x._1 + y._1, x._2 + y._2) result in parallel, illustration (1 + 2, 1 + 1) , on.

but part leaves me confused:

(x, y) => (x._1 + y, x._2 + 1)

why x._1 + y? , here x._2 0?

thanks in advance.

from documentation:

def aggregate[b](z: ⇒ b)(seqop: (b, a) ⇒ b, combop: (b, b) ⇒ b): b

aggregates results of applying operator subsequent elements.

this more general form of fold , reduce. has similar semantics, not require result supertype of element type. traverses elements in different partitions sequentially, using seqop update result, , applies combop results different partitions. implementation of operation may operate on arbitrary number of collection partitions, combop may invoked arbitrary number of times.

for example, 1 might want process elements , produce set. in case, seqop process element , append list, while combop concatenate 2 lists different partitions together. initial value z empty set.

pc.aggregate(set[int]())(_ += process(_), _ ++ _)

another illustration calculating geometric mean collection of doubles (one typically require big doubles this). b type of accumulated results z initial value accumulated result of partition - typically neutral element seqop operator (e.g. nil list concatenation or 0 summation) , may evaluated more 1 time seqop operator used accumulate results within partition combop associative operator used combine results different partitions

in illustration b tuple2[int, int]. method seqop takes single element list, scoped y, , updates aggregate b (x._1 + y, x._2 + 1). increments sec element in tuple. puts sum of elements first element of tuple , number of elements sec element of tuple.

the method combop takes results each parallel execution thread , combines them. combination add-on provides same results if run on list sequentially.

using b tuple confusing piece of this. can break problem downwards 2 sub problems improve thought of doing. res0 first element in result tuple, , res1 sec element in result tuple.

// sums elements in parallel. scala> x.par.aggregate(0)((x, y) => x + y, (x, y) => x + y) res0: int = 21 // counts elements in parallel. scala> x.par.aggregate(0)((x, y) => x + 1, (x, y) => x + y) res1: int = 6

scala aggregate

No comments:

Post a Comment