spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-17161) Add PySpark-ML JavaWrapper convienience function to create py4j JavaArrays
Date Fri, 19 Aug 2016 23:11:20 GMT
Bryan Cutler created SPARK-17161:
------------------------------------

             Summary: Add PySpark-ML JavaWrapper convienience function to create py4j JavaArrays
                 Key: SPARK-17161
                 URL: https://issues.apache.org/jira/browse/SPARK-17161
             Project: Spark
          Issue Type: Improvement
          Components: ML, PySpark
            Reporter: Bryan Cutler
            Priority: Minor


Often in Spark ML, there are classes that use a Scala `Array` to construct.  In order to add
the same API to Python, a Java-friendly alternate constructor needs to exist to be compatible
with py4j when converting from a list.  This is because the current conversion in PySpark
_py2java creates a java.util.ArrayList, as shown in this error msg

{noformat}
Py4JError: An error occurred while calling None.org.apache.spark.ml.feature.CountVectorizerModel.
Trace:
py4j.Py4JException: Constructor org.apache.spark.ml.feature.CountVectorizerModel([class java.util.ArrayList])
does not exist
	at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)
	at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)
	at py4j.Gateway.invoke(Gateway.java:235)
{noformat}

Creating an alternate constructor can be avoided by creating a py4j JavaArray using {{new_array}}.
 This type is compatible with the Scala `Array` currently used in classes like {{CountVectorizerModel}}
and {{StringIndexerModel}}.

Most of the boiler-plate Python code to do this can be put in a convenience function inside
of  ml.JavaWrapper to give a clean way of constructing ML objects without adding special constructors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message