spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WeichenXu123 <...@git.apache.org>
Subject [GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator fitting...
Date Wed, 06 Dec 2017 03:17:27 GMT
GitHub user WeichenXu123 opened a pull request:

    https://github.com/apache/spark/pull/19904

    [SPARK-22707][ML] Optimize CrossValidator fitting memory occupation by models

    ## What changes were proposed in this pull request?
    
    Via some test I found CrossValidator still exists memory issue, it will still occupy `O(n*sizeof(model))`
memory for holding models when fitting, if well optimized, it should be `O(parallelism*sizeof(model))`
    
    This is because modelFutures will hold the reference to model object after future is complete
(we can use `future.value.get.get` to fetch it), and the `Future.sequence` and the `modelFutures`
array holds references to each model future. So all model object are keep referenced until
`fit` return. So it will still occupy `O(n*sizeof(model))` memory.
    
    I fix this by merging the `modelFuture` and `foldMetricFuture` together, and via `wait/notify`
to unpersist training dataset in time.
    
    ## How was this patch tested?
    
    N/A


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/WeichenXu123/spark fix_cross_validator_memory_issue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19904.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19904
    
----
commit 7725fd8a86dddba6c61c7d053dfa510a114bebb8
Author: WeichenXu <weichen.xu@databricks.com>
Date:   2017-12-05T11:45:42Z

    init pr

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message