flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anis Nasir (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1725) New Partitioner for better load balancing for skewed data
Date Wed, 18 Mar 2015 12:19:38 GMT
Anis Nasir created FLINK-1725:

             Summary: New Partitioner for better load balancing for skewed data
                 Key: FLINK-1725
                 URL: https://issues.apache.org/jira/browse/FLINK-1725
             Project: Flink
          Issue Type: Improvement
          Components: New Components
    Affects Versions: 0.8.1
            Reporter: Anis Nasir


We have recently studied the problem of load balancing in Storm [1].
In particular, we focused on key distribution of the stream for skewed skewede data.
We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves
better load balancing than key grouping while being more scalable than shuffle grouping in
terms of memory.

In the paper we show a number of mining algorithms that are easy to implement with partial
key grouping, and whose performance can benefit from it. We think that it might also be useful
for a larger class of algorithms.

Partial key grouping is very easy to implement: it requires just a few lines of code in Java
when implemented as a custom grouping in Storm [2].

For all these reasons, we believe it will be a nice addition to the standard Partitioners
available in Flink. If the community thinks it's a good idea, we will be happy to offer support
in the porting.

[1]. https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
[2]. https://github.com/gdfm/partial-key-grouping

This message was sent by Atlassian JIRA

View raw message