spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhengruifeng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-13677) Support Tree-Based Feature Transformation for ML
Date Wed, 04 Jan 2017 02:50:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796968#comment-15796968
] 

zhengruifeng commented on SPARK-13677:
--------------------------------------

Not at all. I know you commetters are busy. I will add an API here.

> Support Tree-Based Feature Transformation for ML
> ------------------------------------------------
>
>                 Key: SPARK-13677
>                 URL: https://issues.apache.org/jira/browse/SPARK-13677
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: zhengruifeng
>            Priority: Minor
>
> It would be nice to be able to use RF and GBT for feature transformation:
> First fit an ensemble of trees (like RF, GBT or other TreeEnsambleModels) on the training
set. Then each leaf of each tree in the ensemble is assigned a fixed arbitrary feature index
in a new feature space. These leaf indices are then encoded in a one-hot fashion.
> This method was first introduced by facebook(http://www.herbrich.me/papers/adclicksfacebook.pdf),
and is implemented in two famous library:
> sklearn (http://scikit-learn.org/stable/auto_examples/ensemble/plot_feature_transformation.html#example-ensemble-plot-feature-transformation-py)
> xgboost (https://github.com/dmlc/xgboost/blob/master/demo/guide-python/predict_leaf_indices.py)
> I have implement it in mllib:
> val features : RDD[Vector] = ...
> val model1 : RandomForestModel = ...
> val transformed1 : RDD[Vector] = model1.leaf(features)
> val model2 : GradientBoostedTreesModel = ...
> val transformed2 : RDD[Vector] = model2.leaf(features)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message