spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xusen Yin (JIRA)" <>
Subject [jira] [Commented] (SPARK-15574) Python meta-algorithms in Scala
Date Wed, 15 Jun 2016 20:21:09 GMT


Xusen Yin commented on SPARK-15574:

I just finished the prototype of PythonTransformer in Scala as the transformer wrapper of
pure Python transformers. It works well if I run it alone from Scala side. But if I chained
the PythonTransformer with other transformers/estimators in Pipeline, it fails for lacking
of transformSchema in Python side. AFAIK, we need to add transformSchema in Python ML for
pure Python PipelineStages. [~josephkb] [~mengxr]

> Python meta-algorithms in Scala
> -------------------------------
>                 Key: SPARK-15574
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, PySpark
>            Reporter: Joseph K. Bradley
> This is an experimental idea for implementing Python ML meta-algorithms (CrossValidator,
TrainValidationSplit, Pipeline, OneVsRest, etc.) in Scala.  This would require a Scala wrapper
for algorithms implemented in Python, somewhat analogous to Python UDFs.
> The benefit of this change would be that we could avoid currently awkward conversions
between Scala/Python meta-algorithms required for persistence.  It would let us have full
support for Python persistence and would generally simplify the implementation within MLlib.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message