spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-25309) Sci-kit Learn like Auto Pipeline Parallelization in Spark
Date Fri, 07 Sep 2018 17:05:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bryan Cutler updated SPARK-25309:
---------------------------------
    Component/s: ML

> Sci-kit Learn like Auto Pipeline Parallelization in Spark 
> ----------------------------------------------------------
>
>                 Key: SPARK-25309
>                 URL: https://issues.apache.org/jira/browse/SPARK-25309
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, PySpark
>    Affects Versions: 2.3.1
>            Reporter: Ravi
>            Priority: Critical
>
> SPARK-19357 and SPARK-21911 have helped parallelize Pipelines in Spark. However, instead
of setting the parallelism Parameter in the CrossValidator it would be good to have something
like njobs=-1 (like Scikit Learn) where the Pipeline DAG could be automatically parallelized
and scheduled based on the resources allocated to the Spark Session instead of having the
user pick the integer value for this parameter. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message