spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-14760) Feature transformers should always invoke transformSchema in transform or fit
Date Thu, 21 Apr 2016 06:43:25 GMT

    [ https://issues.apache.org/jira/browse/SPARK-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251391#comment-15251391
] 

Nick Pentreath commented on SPARK-14760:
----------------------------------------

In general, given the name {{transformSchema}}, one would expect the method to actually transform
the input schema into the output schema. This is the case, but only a few transformers actually
seem to use the output schema returned from {{transformSchema}}. Hence, the output schema
enforced in {{transformSchema}} is not actually enforced in {{fit}} or {{transform}}.

So in a Pipeline, you can call {{transformSchema}} for each stage, which performs validation
upfront, but if the individual transformers don't enforce the output schema returned, you
can have a situation where the schema validation succeeds but a pipeline stage does something
different and breaks it.

IMO the approach used by those examples {{HashingTF}}, {{Binarizer}} is correct and other
transformers should do the same, no?

> Feature transformers should always invoke transformSchema in transform or fit
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-14760
>                 URL: https://issues.apache.org/jira/browse/SPARK-14760
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: yuhao yang
>            Priority: Minor
>
> Since one of the primary function for transformSchema is to conduct parameter validation,
transformers should always invoke transformSchema in transform and fit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message