spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Facai (颜发才) (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SPARK-3728) RandomForest: Learn models too large to store in memory
Date Wed, 22 Mar 2017 12:26:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936208#comment-15936208
] 

Yan Facai (颜发才) commented on SPARK-3728:
----------------------------------------

RandomForest already use a stack to save node, as [~jgfidelis] said before. However, all trees
are still kept in memory, see `topNodes`.  

Perhaps, writing trees to disk is still needed if too many trees trained.

> RandomForest: Learn models too large to store in memory
> -------------------------------------------------------
>
>                 Key: SPARK-3728
>                 URL: https://issues.apache.org/jira/browse/SPARK-3728
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> Proposal: Write trees to disk as they are learned.
> RandomForest currently uses a FIFO queue, which means training all trees at once via
breadth-first search.  Using a FILO queue would encourage the code to finish one tree before
moving on to new ones.  This would allow the code to write trees to disk as they are learned.
> Note: It would also be possible to write nodes to disk as they are learned using a FIFO
queue, once the example--node mapping is cached [JIRA].  The [Sequoia Forest package]() does
this.  However, it could be useful to learn trees progressively, so that future functionality
such as early stopping (training fewer trees than expected) could be supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message