spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cloud-fan <...@git.apache.org>
Subject [GitHub] spark pull request #22112: [WIP][SPARK-23243][Core] Fix RDD.repartition() da...
Date Wed, 15 Aug 2018 20:02:52 GMT
GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/22112

    [WIP][SPARK-23243][Core] Fix RDD.repartition() data correctness issue

    ## What changes were proposed in this pull request?
    
    An alternative fix for https://github.com/apache/spark/pull/21698
    
    RDD can take arbitrary user function, but we have an assumption: the function should produce
same data set for same input, but the order can change.
    
    Spark scheduler must take care of this assumption when fetch failure happens, otherwise
we may hit correctness issue as the JIRA ticket described.
    
    Generall speaking, when a map stage gets retried because of fetch failure, and this map
stage is not idempotent(produce same data set but different order each time), and the shuffle
partitioner is sensitive to the input data order(like round robin partitioner), we should
retry all the reduce tasks.
    
    TODO: document and test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark repartition

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22112
    
----
commit 1f9f6e5b020038be1e7c11b9923010465da385aa
Author: Wenchen Fan <wenchen@...>
Date:   2018-08-15T18:38:24Z

    fix repartition+shuffle bug

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message