spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <>
Subject [jira] [Created] (SPARK-3160) Simplify DecisionTree data structure for training
Date Wed, 20 Aug 2014 22:23:26 GMT
Joseph K. Bradley created SPARK-3160:

             Summary: Simplify DecisionTree data structure for training
                 Key: SPARK-3160
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: Joseph K. Bradley
            Priority: Minor

Improvement: code clarity

Currently, we maintain a tree structure, a flat array of nodes, and a parentImpurities array.

Proposed fix: Maintain everything within a growing tree structure.  For this, we could have
a “LearningNode extends Node” setup where the LearningNode holds metadata for learning
(such as impurities).  The test-time model could be extracted from this training-time model,
so that extra information (such as impurities) does not have to be kept after training.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message