spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From srowen <>
Subject [GitHub] spark pull request #16415: [SPARK-19007]Speedup and optimize the GradientBoo...
Date Wed, 28 Dec 2016 12:11:31 GMT
Github user srowen commented on a diff in the pull request:
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
    @@ -329,8 +330,16 @@ private[spark] object GradientBoostedTrees extends Logging {
           //       However, the behavior should be reasonable, though not optimal.
           baseLearnerWeights(m) = learningRate
    +      if (pre_predError.getStorageLevel != StorageLevel.NONE ){
    --- End diff --
    Got it, one thing I missed is that you're adding a call to `persist()` and that is the
key. You've hardcoded `MEMORY_AND_DISK` though... you should probably match the persistence
level of the input, if any, as other similar blocks of code do. This gives the caller a way
to turn this off.
    I don't see how this relates to running out of memory though ... if anything this causes
you to run out of memory? well, it still seems like an improvement and that may be unrelated.
    (Will `previousPredError` ever be persisted after the loop completes?)
    (You can probably also dispense with checking the storage level is not NONE because `unpersist()`
just does nothing in this case.)

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message