spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <>
Subject [jira] [Created] (SPARK-17161) Add PySpark-ML JavaWrapper convienience function to create py4j JavaArrays
Date Fri, 19 Aug 2016 23:11:20 GMT
Bryan Cutler created SPARK-17161:

             Summary: Add PySpark-ML JavaWrapper convienience function to create py4j JavaArrays
                 Key: SPARK-17161
             Project: Spark
          Issue Type: Improvement
          Components: ML, PySpark
            Reporter: Bryan Cutler
            Priority: Minor

Often in Spark ML, there are classes that use a Scala `Array` to construct.  In order to add
the same API to Python, a Java-friendly alternate constructor needs to exist to be compatible
with py4j when converting from a list.  This is because the current conversion in PySpark
_py2java creates a java.util.ArrayList, as shown in this error msg

Py4JError: An error occurred while calling
py4j.Py4JException: Constructor[class java.util.ArrayList])
does not exist
	at py4j.reflection.ReflectionEngine.getConstructor(
	at py4j.reflection.ReflectionEngine.getConstructor(
	at py4j.Gateway.invoke(

Creating an alternate constructor can be avoided by creating a py4j JavaArray using {{new_array}}.
 This type is compatible with the Scala `Array` currently used in classes like {{CountVectorizerModel}}
and {{StringIndexerModel}}.

Most of the boiler-plate Python code to do this can be put in a convenience function inside
of  ml.JavaWrapper to give a clean way of constructing ML objects without adding special constructors.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message