spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-3160) Simplify DecisionTree data structure for training
Date Wed, 20 Aug 2014 22:23:26 GMT
Joseph K. Bradley created SPARK-3160:
----------------------------------------

             Summary: Simplify DecisionTree data structure for training
                 Key: SPARK-3160
                 URL: https://issues.apache.org/jira/browse/SPARK-3160
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: Joseph K. Bradley
            Priority: Minor


Improvement: code clarity

Currently, we maintain a tree structure, a flat array of nodes, and a parentImpurities array.

Proposed fix: Maintain everything within a growing tree structure.  For this, we could have
a “LearningNode extends Node” setup where the LearningNode holds metadata for learning
(such as impurities).  The test-time model could be extracted from this training-time model,
so that extra information (such as impurities) does not have to be kept after training.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message