apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject load based stream partitioning
Date Thu, 11 Feb 2016 19:56:26 GMT

There are scenarios where the downstream partitions of an upstream operator
are generally not performing uniformly resulting in an overall sub-optimal
performance dictated by the slowest partitions. The reasons could be data
related such as some partitions are receiving more data to process than the
others or could be environment related such as some partitions are running
slower than others because they are on heavily loaded nodes.

A solution based on currently available functionality in the engine would
be to write a StreamCodec implementation to distribute data among the
partitions such that each partition is receiving similar amount of data to
process. We should consider adding StreamCodecs like these to the library
but these however do not solve the problem when it is environment related.

For that a better and more comprehensive approach would be look at how data
is being consumed by the downstream partitions from the BufferServer and
use that information to make decisions on how to send future data. If some
partitions are behind others in consuming data then data can be directed to
the other partitions. One way to do this would be to relay this type of
statistical and positional information from BufferServer to the upstream
publishers. The publishers can use this information in ways such as making
it available to StreamCodecs to affect destination of future data.

What do you think.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message