flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-2620) StreamPartitioner is not properly initialized for shuffle and rebalance
Date Fri, 04 Sep 2015 11:53:45 GMT
Matthias J. Sax created FLINK-2620:

             Summary: StreamPartitioner is not properly initialized for shuffle and rebalance
                 Key: FLINK-2620
                 URL: https://issues.apache.org/jira/browse/FLINK-2620
             Project: Flink
          Issue Type: Bug
          Components: Streaming
            Reporter: Matthias J. Sax

Using rebalance connection pattern, the round-robin distribution starts with the same receiver
index in all parallel tasks. For high receiver dop and (very) small data, it might result
in an inbalanced distribution. For example, 100 source tasks, each emitting 10 record to 100
consumer tasks. Thus, only the first 10 consumer tasks receive data (10 records each) while
the other 90 do not.

A possible fix would be, to compute different starting indexes for different producer tasks

startIdx = (numReceivers / numSenders) * myIdx

For shuffle grouping, the random data generator is initialized with a unique seed for all
parallel tasks. This should be changed such that each task uses a different seed.

To achieve both, the StreamPartitioner class should be extended with a
configuration / initialize method which is called on each parallel operator.

For a full discussion please see the mailing list archive: https://mail-archives.apache.org/mod_mbox/flink-dev/201509.mbox/browser

This message was sent by Atlassian JIRA

View raw message