spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shivaram Venkataraman (JIRA)" <>
Subject [jira] [Commented] (SPARK-6823) Add a model.matrix like capability to DataFrames (modelDataFrame)
Date Sun, 02 Aug 2015 21:36:04 GMT


Shivaram Venkataraman commented on SPARK-6823:

[~ekhliang] [~mengxr] Is this addressed by the StringType PR ? I'm wondering if we can resolve
this issue

> Add a model.matrix like capability to DataFrames (modelDataFrame)
> -----------------------------------------------------------------
>                 Key: SPARK-6823
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, SparkR
>            Reporter: Shivaram Venkataraman
> Currently Mllib modeling tools work only with double data. However, data tables in practice
often have a set of categorical fields (factors in R), that need to be converted to a set
of 0/1 indicator variables (making the data actually used in a modeling algorithm completely
numeric). In R, this is handled in modeling functions using the model.matrix function. Similar
functionality needs to be available within Spark.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message