spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wojciech Jurczyk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-7146) Should ML sharedParams be a public API?
Date Tue, 12 Jan 2016 14:17:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093932#comment-15093932
] 

Wojciech Jurczyk commented on SPARK-7146:
-----------------------------------------

{quote}Cons:
Users have to be careful since parameters can have different meanings for different algorithms."{quote}

I think, the users have to be careful even if the trait stay private. I mean, getters/setters
and the parameters themselves are visible anyway (users have to set the parameters somehow).

Consider a parameter called threshold. Obviously, it can have multiple meanings depending
on the context. Currently, threshold's meaning hardcoded to link to binary classification
and it can't be used in other cases.
{quote}Sharing the Param traits helps to encourage standardized Param names and documentation{quote}
but result in more specialized params (which restricts their use cases).

On the other hand, inputCol/outputCol are good examples of parameters that are fully universal
and generic. Having them in one trait would indeed result in some kind of standardization.

{quote}If the shared Params are public, then implementations could test for the traits.{quote}
A side note: this can be done anyway (by structural typing). And it's not always a bad thing
(knowing that the meaning of the parameters can be different).
{quote}It is unclear if we want users to rely on these traits, which are somewhat experimental.{quote}
As I mentioned in SPARK-12751, we want to rely on the traits (for now only input/output column,
and obviously, only for Transformers that are not UnaryTransformers). As far as I know classes
in ML that use sharedParams are experimental, too (like LinearRegressionModel). We depend
on experimental API anyway.

Maybe the parameters can be divided into groups? Parameters in the first one would be fully
universal (like inputCol). In the second group parameters would be less universal (but still
shared, if used multiple times).
Additionally, I think some parameters should be thrown out from the shared params. Consider
the threshold from the shared params once again. It's used only in Logistic Regression (if
I'm correct). Other operations, like Binarize define their own threshold param.

Finally, I would vote for the option (b). Overriding docs will do. And then, maybe it'd possible
to split the trait into two: for internal and external use? To benefit from having both private
and public traits.


> Should ML sharedParams be a public API?
> ---------------------------------------
>
>                 Key: SPARK-7146
>                 URL: https://issues.apache.org/jira/browse/SPARK-7146
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> Discussion: Should the Param traits in sharedParams.scala be public?
> Pros:
> * Sharing the Param traits helps to encourage standardized Param names and documentation.
> Cons:
> * Users have to be careful since parameters can have different meanings for different
algorithms.
> * If the shared Params are public, then implementations could test for the traits.  It
is unclear if we want users to rely on these traits, which are somewhat experimental.
> Currently, the shared params are private.
> Proposal: Either
> (a) make the shared params private to encourage users to write specialized documentation
and value checks for parameters, or
> (b) design a better way to encourage overriding documentation and parameter value checks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message