spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mridulm <>
Subject [GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Date Wed, 11 Jul 2018 05:14:50 GMT
Github user mridulm commented on the issue:
    @cloud-fan The difference would be between a (user) defined record order (global sort
or local sort) and expectation of repeatable record order on recomputation.
    It might also be a good idea to explore how other frameworks handle this.
    > However, the round robin partitione(following with a shuffle) violates it.
    This is is not limited to repartition : any closure which depends on input order has the
same effect - repartition/coalesce is one instance of this issue - I gave a few examples from
spark itself; and I am sure there are other examples from spark and user code.
    It is possible this issue was initially identified via repartition - but modeling the
solution only for one manifestation of the issue ignores all others and leaves them unfixed.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message