Tuesday, 15 February 2011

apache spark - how to convert LibSVM file with multi classes into an RDD[labelPoint] -



apache spark - how to convert LibSVM file with multi classes into an RDD[labelPoint] -

using next method org.apache.spark.mllib.util.mlutils bundle ,loads binary labeled info in libsvm format rdd[labeledpoint], number of features determined automatically , default number of partitions.

def loadlibsvmfile(sc: sparkcontext, path: string): rdd[labeledpoint]

my problem loading info multi class labels? when using method on multiclass labeled data...it getting converted binary labeled data.. there way load multiclass info in libsvm format rdd[labeledpoint]...??

there 1 more method in same bundle next description

loads labeled info in libsvm format rdd[labeledpoint], default number of partitions.

def loadlibsvmfile(sc: sparkcontext, path: string, numfeatures: int): rdd[labeledpoint]

but when i'm trying utilize ,,there error showing "found int ,requires boolean"

what version of spark using? used file http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/glass.scale

spark 1.1 , next code:

val lbldrdd = mlutils.loadlibsvmfile(sc,svmfile) lbldrdd.map(_.label).collect().toset.map(println)

i see output:

5.0 1.0 6.0 2.0 7.0 3.0

which seems right me

apache-spark libsvm mllib

No comments:

Post a Comment