flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anisnasir <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-1725]- New Partitioner for better load ...
Date Fri, 28 Aug 2015 22:47:23 GMT
Github user anisnasir commented on the pull request:

    https://github.com/apache/flink/pull/1069#issuecomment-135905274
  
    @tillrohrmann you are absolutely right with your observation that high skews require more
than two workers to process the most frequent keys. However, most of the real world datasets
do not have high skews [1] and can be handled by just splitting keys into two components.
    
    I was planning to write a wordcount example with both HashPartitioner and PartialPartitioner.
Can you explain a little more on how one could check that the skew of the input data decreases
after the partitioning.
    
    [1]. https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message