spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sitalkedia <...@git.apache.org>
Subject [GitHub] spark pull request #18150: Cleanup shuffle
Date Tue, 30 May 2017 22:26:22 GMT
GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/18150

    Cleanup shuffle

    ## What changes were proposed in this pull request?
    
    Currently, when we detect fetch failure, we only remove the shuffle files produced by
the executor, while the host itself might be down and all the shuffle files are not accessible.
In case we are running multiple executors on a host, any host going down currently results
in multiple fetch failures and multiple retries of the stage, which is very inefficient. If
we remove all the shuffle files on that host, on first fetch failure, we can rerun all the
tasks on that host in a single stage retry.
    
    ## How was this patch tested?
    
    Unit testing and also ran a job on the cluster and made sure multiple retries are gone.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark cleanup_shuffle

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18150
    
----
commit 74ca88bc1d2b67cc12ea32a3cd344ec0259500a9
Author: Sital Kedia <skedia@fb.com>
Date:   2017-02-25T00:35:00Z

    [SPARK-19753][CORE] All shuffle files on a host should be removed in case of fetch failure
or slave lost

commit 6898c2bb0f8a65dcc488e53b248fbeaec64efdb8
Author: Sital Kedia <skedia@fb.com>
Date:   2017-03-01T02:03:55Z

    Do not un-register shuffle files in case of executor lost

commit 32a2315caa07a5a6be1bd92ec1e13500b74308cb
Author: Sital Kedia <skedia@fb.com>
Date:   2017-03-01T02:13:07Z

    no-op when external shuffle service is disabled

commit c7c3129dcc4ad2fc1a75bff5a941f6c4a8dfd0ef
Author: Sital Kedia <skedia@fb.com>
Date:   2017-03-01T02:23:59Z

    fix check style

commit f96ec68d6922fe2108c5869fedf2d8aca373c6eb
Author: Sital Kedia <skedia@fb.com>
Date:   2017-03-16T22:24:43Z

    Addressed review comments and fixed a bug

commit d4979e35137152db00c53ea0b9e82aaf41dad5b5
Author: Sital Kedia <skedia@fb.com>
Date:   2017-03-16T23:00:17Z

    Fix build

commit 8787db1679c5b468afa3d2ede64eee53908fa5de
Author: Sital Kedia <skedia@fb.com>
Date:   2017-03-17T02:37:44Z

    Fix test failures

commit 4ca9527a8cf78ba1c3e64c81ee6afc9e93b05fe6
Author: Imran Rashid <irashid@cloudera.com>
Date:   2017-03-17T15:53:25Z

    refactoring & comments

commit 9f64e2931eabd2fcc5909123e73c9c046caceb3b
Author: Sital Kedia <skedia@fb.com>
Date:   2017-03-18T04:03:37Z

    Review comments

commit be3b3dbd2d813a3d1d164d9b7f8127d09b752880
Author: Sital Kedia <skedia@fb.com>
Date:   2017-03-24T22:39:05Z

    Minor changes as per review comments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message