spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22021) Add a feature transformation to accept a function and apply it on all rows of dataframe
Date Fri, 15 Sep 2017 11:14:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167722#comment-16167722
] 

Sean Owen commented on SPARK-22021:
-----------------------------------

Why can't you just apply this function? Or implement Transformer. I'm not sure what the point
of a transformer that just applies a function is. 

> Add a feature transformation to accept a function and apply it on all rows of dataframe
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-22021
>                 URL: https://issues.apache.org/jira/browse/SPARK-22021
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Hosur Narahari
>
> More often we generate derived features in ML pipeline by doing some mathematical or
other kind of operation on columns of dataframe like getting a total of few columns as a new
column or if there is text field message and we want the length of message etc. We currently
don't have an efficient way to handle such scenario in ML pipeline.
> By Providing a transformer which accepts a function and performs that on mentioned columns
to generate output column of numerical type, user has the flexibility to derive features by
applying any domain specific logic.
> Example:
> val function = "function(a,b) { return a+b;}"
> val transformer = new GenFuncTransformer().setInputCols(Array("v1", "v2")).setOutputCol("result").setFunction(function)
> val df = Seq((1.0, 2.0), (3.0, 4.0)).toDF("v1", "v2")
> val result = transformer.transform(df)
> result.show
> v1   v2  result
> 1.0 2.0 3.0
> 3.0 4.0 7.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message