spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5972) Cache residuals for GradientBoostedTrees during training
Date Wed, 01 Apr 2015 21:40:53 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391521#comment-14391521
] 

Manoj Kumar commented on SPARK-5972:
------------------------------------

[~josephkb] This should be done independently of evaluateEachIteration right? (In the sense,
that evaluateEachIteration should not be used in the GradientBoostedTrees code that does this,
that is caching the error and residuals, since the model has not been trained yet)



> Cache residuals for GradientBoostedTrees during training
> --------------------------------------------------------
>
>                 Key: SPARK-5972
>                 URL: https://issues.apache.org/jira/browse/SPARK-5972
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> In gradient boosting, the current model's prediction is re-computed for each training
instance on every iteration.  The current residual (cumulative prediction of previously trained
trees in the ensemble) should be cached.  That could reduce both computation (only computing
the prediction of the most recently trained tree) and communication (only sending the most
recently trained tree to the workers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message