spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mridulm <...@git.apache.org>
Subject [GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Date Wed, 11 Jul 2018 05:14:50 GMT
Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    @cloud-fan The difference would be between a (user) defined record order (global sort
or local sort) and expectation of repeatable record order on recomputation.
    It might also be a good idea to explore how other frameworks handle this.
    
    > However, the round robin partitione(following with a shuffle) violates it.
    
    This is is not limited to repartition : any closure which depends on input order has the
same effect - repartition/coalesce is one instance of this issue - I gave a few examples from
spark itself; and I am sure there are other examples from spark and user code.
    
    It is possible this issue was initially identified via repartition - but modeling the
solution only for one manifestation of the issue ignores all others and leaves them unfixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message