spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <>
Subject [jira] [Commented] (SPARK-8418) Add single- and multi-value support to ML Transformers
Date Sun, 24 Dec 2017 07:21:00 GMT


Joseph K. Bradley commented on SPARK-8418:

One more thought: Looking at existing PRs and docs for inputCols & outputCols, I'm worried
it may be unclear to users how to use multi-column APIs.  E.g., if OneHotEncoderEstimator
(or any of the others) have docs talking about transforming a Numeric column to a Vector column,
then users may be confused about whether each inputCol is treated independently, all concatenated
in the output, or what.  I'm commenting on the OHE PR but thought this was relevant to all
of these PRs.

> Add single- and multi-value support to ML Transformers
> ------------------------------------------------------
>                 Key: SPARK-8418
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Joseph K. Bradley
> It would be convenient if all feature transformers supported transforming columns of
single values and multiple values, specifically:
> * one column with one value (e.g., type {{Double}})
> * one column with multiple values (e.g., {{Array[Double]}} or {{Vector}})
> We could go as far as supporting multiple columns, but that may not be necessary since
VectorAssembler could be used to handle that.
> Estimators under {{ml.feature}} should also support this.
> This will likely require a short design doc to describe:
> * how input and output columns will be specified
> * schema validation
> * code sharing to reduce duplication

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message