spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hishamm <hisham.moha...@unige.ch>
Subject Decision Tree Model
Date Thu, 01 Oct 2015 14:20:16 GMT
Hi,

I am using SPARK 1.4.0, Python and Decision Trees to perform machine
learning classification. 

I test it by creating the predictions and zip it to the test data, as
following: 


*predictions = tree_model.predict(test_data.map(lambda a: a.features))
labels = test_data.map(lambda a: a.label).zip(predictions)
correct = 100 * (labels.filter(lambda (v, p): v == p).count() /
float(test_data.count()))*

I always get this error in the zipping phase:

*Can not deserialize RDD with different number of items in pair: (3, 2)*


To avoid zipping, I tried to do it in a different way, as follows:

*labels = test_data.map(lambda a: (a.label, tree_model.predict(a.features)))
correct = 100 * (labels.filter(lambda (v, p): v == p).count() /
float(test_data.count()))*

However, I always get this error:

*in __getnewargs__(self)
    250         # This method is called when attempting to pickle
SparkContext, which is always an error:
    251         raise Exception(
--> 252             "It appears that you are attempting to reference
SparkContext from a broadcast "
    253             "variable, action, or transforamtion. SparkContext can
only be used on the driver, "
    254             "not in code that it run on workers. For more
information, see SPARK-5063."

Exception: It appears that you are attempting to reference SparkContext from
a broadcast variable, action, or transforamtion. SparkContext can only be
used on the driver, not in code that it run on workers. For more
information, see SPARK-5063.*


Is the DecisionTreeModel part of the SparkContext ?!  
I found that using Scala, we can apply the second approach with no problem. 


So, how can I solve the two problems ?

Thanks and Regards,
Hisham












--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Decision-Tree-Model-tp24899.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message