spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WeichenXu123 <>
Subject [GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator fitting...
Date Wed, 06 Dec 2017 03:17:27 GMT
GitHub user WeichenXu123 opened a pull request:

    [SPARK-22707][ML] Optimize CrossValidator fitting memory occupation by models

    ## What changes were proposed in this pull request?
    Via some test I found CrossValidator still exists memory issue, it will still occupy `O(n*sizeof(model))`
memory for holding models when fitting, if well optimized, it should be `O(parallelism*sizeof(model))`
    This is because modelFutures will hold the reference to model object after future is complete
(we can use `future.value.get.get` to fetch it), and the `Future.sequence` and the `modelFutures`
array holds references to each model future. So all model object are keep referenced until
`fit` return. So it will still occupy `O(n*sizeof(model))` memory.
    I fix this by merging the `modelFuture` and `foldMetricFuture` together, and via `wait/notify`
to unpersist training dataset in time.
    ## How was this patch tested?

You can merge this pull request into a Git repository by running:

    $ git pull fix_cross_validator_memory_issue

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19904
commit 7725fd8a86dddba6c61c7d053dfa510a114bebb8
Author: WeichenXu <>
Date:   2017-12-05T11:45:42Z

    init pr



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message