flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elias Levy <fearsome.lucid...@gmail.com>
Subject Re: Does Kafka connector leverage Kafka message keys?
Date Thu, 14 Apr 2016 01:10:48 GMT
On Wed, Apr 13, 2016 at 2:10 AM, Stephan Ewen <sewen@apache.org> wrote:

> If you want to use Flink's internal key/value state, however, you need to
> let Flink re-partition the data by using "keyBy()". That is because Flink's
> internal sharding of state (including the re-sharding to adjust parallelism
> we are currently working on) follows a dedicated hashing scheme which is
> with all likelihood different from the partition function that writes the
> key/value pairs to the Kafka Topics.

That is interesting, if somewhat disappointing.  I was hoping that
performing a keyBy from a Kafka source would perform no reshuffling if you
used the same value as you used for the Kafka message key.  But it makes
sense if you are using different hash functions.

It may be useful to have a variant of keyBy() that converts the stream to a
KeyedStream but performs no shuffling if the caller is certain that the
DataStream is already partitioned by the given key.

View raw message