spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-6682) Deprecate static train and use builder instead for Scala/Java
Date Wed, 08 Apr 2015 22:12:13 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486178#comment-14486178
] 

Joseph K. Bradley edited comment on SPARK-6682 at 4/8/15 10:12 PM:
-------------------------------------------------------------------

*Optimization*: I agree that this JIRA will be (or should be) blocked by updates to the optimization
API.  It is getting to be high time to fix that.  I'll link this to [SPARK-5256] and will
add my thoughts there.

*Splitting into sub-tasks*: I think this should be split into subtasks per algorithm, rather
than splitting up deprecation/example/documentation.  That way, each subtask makes a consistent
change but should be sufficiently small.


was (Author: josephkb):
*Optimization*: I agree that this JIRA will be blocked by updates to the optimization API.
 It is getting to be high time to fix that.  I'll link this to [SPARK-5256] and will add my
thoughts there.

*Splitting into sub-tasks*: I think this should be split into subtasks per algorithm, rather
than splitting up deprecation/example/documentation.  That way, each subtask makes a consistent
change but should be sufficiently small.

> Deprecate static train and use builder instead for Scala/Java
> -------------------------------------------------------------
>
>                 Key: SPARK-6682
>                 URL: https://issues.apache.org/jira/browse/SPARK-6682
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> In MLlib, we have for some time been unofficially moving away from the old static train()
methods and moving towards builder patterns.  This JIRA is to discuss this move and (hopefully)
make it official.
> "Old static train()" API:
> {code}
> val myModel = NaiveBayes.train(myData, ...)
> {code}
> "New builder pattern" API:
> {code}
> val nb = new NaiveBayes().setLambda(0.1)
> val myModel = nb.train(myData)
> {code}
> Pros of the builder pattern:
> * Much less code when algorithms have many parameters.  Since Java does not support default
arguments, we required *many* duplicated static train() methods (for each prefix set of arguments).
> * Helps to enforce default parameters.  Users should ideally not have to even think about
setting parameters if they just want to try an algorithm quickly.
> * Matches spark.ml API
> Cons of the builder pattern:
> * In Python APIs, static train methods are more "Pythonic."
> Proposal:
> * Scala/Java: We should start deprecating the old static train() methods.  We must keep
them for API stability, but deprecating will help with API consistency, making it clear that
everyone should use the builder pattern.  As we deprecate them, we should make sure that the
builder pattern supports all parameters.
> * Python: Keep static train methods.
> CC: [~mengxr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message