apache spark - how to convert LibSVM file with multi classes into an RDD[labelPoint] -
using next method org.apache.spark.mllib.util.mlutils bundle ,loads binary labeled info in libsvm format rdd[labeledpoint], number of features determined automatically , default number of partitions.
def loadlibsvmfile(sc: sparkcontext, path: string): rdd[labeledpoint]
my problem loading info multi class labels? when using method on multiclass labeled data...it getting converted binary labeled data.. there way load multiclass info in libsvm format rdd[labeledpoint]...??
there 1 more method in same bundle next description
loads labeled info in libsvm format rdd[labeledpoint], default number of partitions.
def loadlibsvmfile(sc: sparkcontext, path: string, numfeatures: int): rdd[labeledpoint]
but when i'm trying utilize ,,there error showing "found int ,requires boolean"
what version of spark using? used file http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/glass.scale
spark 1.1 , next code:
val lbldrdd = mlutils.loadlibsvmfile(sc,svmfile) lbldrdd.map(_.label).collect().toset.map(println)
i see output:
5.0 1.0 6.0 2.0 7.0 3.0
which seems right me
apache-spark libsvm mllib
No comments:
Post a Comment