spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sro...@apache.org
Subject spark git commit: [SPARK-22075][ML] GBTs unpersist datasets cached by Checkpointer
Date Thu, 21 Sep 2017 19:05:47 GMT
Repository: spark
Updated Branches:
  refs/heads/master 9cac249fd -> b21b806ec


[SPARK-22075][ML] GBTs unpersist datasets cached by Checkpointer

## What changes were proposed in this pull request?
`PeriodicRDDCheckpointer` will automatically persist the last 3 datasets called by `PeriodicRDDCheckpointer.update()`.
In GBTs, the last 3 intermediate rdds are still cached after `fit()`

## How was this patch tested?
existing tests and local test in spark-shell

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #19288 from zhengruifeng/gbt_unpersist.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b21b806e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b21b806e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b21b806e

Branch: refs/heads/master
Commit: b21b806ecc55f15575833c1e859c35ae391ff369
Parents: 9cac249
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Authored: Thu Sep 21 20:05:44 2017 +0100
Committer: Sean Owen <sowen@cloudera.com>
Committed: Thu Sep 21 20:05:44 2017 +0100

----------------------------------------------------------------------
 .../scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/b21b806e/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
index ce2bd7b..e32447a 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
@@ -360,7 +360,9 @@ private[spark] object GradientBoostedTrees extends Logging {
     logInfo("Internal timing for DecisionTree:")
     logInfo(s"$timer")
 
+    predErrorCheckpointer.unpersistDataSet()
     predErrorCheckpointer.deleteAllCheckpoints()
+    validatePredErrorCheckpointer.unpersistDataSet()
     validatePredErrorCheckpointer.deleteAllCheckpoints()
     if (persistedInput) input.unpersist()
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message