flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Smirnov Sergey Vladimirovich (39833)" <s.smirn...@tinkoff.ru>
Subject kafka partitions, data locality
Date Wed, 17 Apr 2019 09:33:05 GMT

We planning to use apache flink as a core component of our new streaming system for internal
processes (finance, banking business) based on apache kafka.
So we starting some research with apache flink and one of the question, arises during that
work, is how flink handle with data locality.
I`ll try to explain: suppose we have a kafka topic with some kind of events. And this events
groups by topic partitions so that the handler (or a job worker), consuming message from a
partition, have all necessary information for further processing.
As an example, say we have client's payment transaction in a kafka topic. We grouping by clientId
(transaction with the same clientId goes to one same kafka topic partition) and the task is
to find max transaction per client in sliding windows. In terms of map\reduce there is no
needs to shuffle data between all topic consumers, may be it`s worth to do within each consumer
to gain some speedup due to increasing number of executors within each partition data.
And my question is how flink will work in this case. Do it shuffle all data, or it have some
settings to avoid this extra unnecessary shuffle/sorting operations?
Thanks in advance!

With best regards,
Sergey Smirnov

View raw message