Thursday, 15 September 2011

Pipe groupByKey data in Apache Spark -



Pipe groupByKey data in Apache Spark -

i transform next info set external script piped apache spark:

key,val1,val2 1,a,b 1,c,d 1,e,f 2,g,h 2,i,j 2,k,l

data should first grouped key , values passed external script using pipe()

i tried code, calls script 1 time , passes info it:

data.map(s => s.split(",")).map(a => (a(1),a)).groupbykey().pipe(seq(sparkfiles.get("test.sh")))

apache pipe apache-spark

No comments:

Post a Comment