spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ulanov, Alexander" <alexander.ula...@hp.com>
Subject RE: Prediction using Classification with text attributes in Apache Spark MLLib
Date Tue, 24 Jun 2014 11:28:06 GMT
Hi,

You need to convert your text to vector space model: http://en.wikipedia.org/wiki/Vector_space_model
and then pass it to SVM. As far as I know, in previous versions of MLlib there was a special
class for doing this: https://github.com/amplab/MLI/blob/master/src/main/scala/feat/NGrams.scala.
It is not compatible with Spark 1.0.
I wonder why MLLib folks didn't include it in newer versions of Spark.

As a workaround, you could use a separate tool to convert your data to LibSVM format http://stats.stackexchange.com/questions/61328/libsvm-data-format,
and then load it with MLUtils.loadLibSVMFile. For example, you could use Weka http://www.cs.waikato.ac.nz/ml/weka/
 (it has friendly UI but doesn't handle big datasets) to convert your file.

Best regards, Alexander

-----Original Message-----
From: lmk [mailto:lakshmi.muralikrishnan@gmail.com] 
Sent: Tuesday, June 24, 2014 3:17 PM
To: user@spark.incubator.apache.org
Subject: Prediction using Classification with text attributes in Apache Spark MLLib

Hi,
I am trying to predict an attribute with binary value (Yes/No) using SVM.
All my attributes which belong to the training set are text attributes. 
I understand that I have to convert my outcome as double (0.0/1.0). But I donot understand
how to deal with my explanatory variables which are also text.
Please let me know how I can do this.

Thanks.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message