spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "holdenk (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-12151) Improve PySpark MLLib prediction performance when using pickled vectors
Date Fri, 04 Dec 2015 21:41:10 GMT
holdenk created SPARK-12151:
-------------------------------

             Summary: Improve PySpark MLLib prediction performance when using pickled vectors
                 Key: SPARK-12151
                 URL: https://issues.apache.org/jira/browse/SPARK-12151
             Project: Spark
          Issue Type: Improvement
          Components: MLlib, PySpark
            Reporter: holdenk
            Priority: Minor


In a number of places inside of PySpark MLLib when calling predict on an RDD we map the Python
prediction function over the RDD, instead we could convert the RDD to an RDD of pickled Vectors
and then use the Java prediction function. This would be useful for models which have optimized
predicting on batches of objects (e.g. by broadcasting the relevant parts of the model or
similar).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message