spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-3207) Choose splits for continuous features in DecisionTree more adaptively
Date Mon, 20 Oct 2014 20:13:33 GMT

     [ https://issues.apache.org/jira/browse/SPARK-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiangrui Meng updated SPARK-3207:
---------------------------------
    Assignee: Qiping Li

> Choose splits for continuous features in DecisionTree more adaptively
> ---------------------------------------------------------------------
>
>                 Key: SPARK-3207
>                 URL: https://issues.apache.org/jira/browse/SPARK-3207
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Qiping Li
>            Priority: Minor
>             Fix For: 1.2.0
>
>
> DecisionTree splits on continuous features by choosing an array of values from a subsample
of the data.
> Currently, it does not check for identical values in the subsample, so it could end up
having multiple copies of the same split.  This is not an error, but it could be improved
to be more adaptive to the data.
> Proposal: In findSplitsBins, check for identical values, and do some searching in order
to find a set of unique splits.  Reduce the number of splits if there are not enough unique
candidates.
> This would require modifying findSplitsBins and making sure that the number of splits/bins
(chosen adaptively) is set correctly elsewhere in the code (such as in DecisionTreeMetadata).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message