flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanne de Roever <sanne.de.roe...@gmail.com>
Subject Decouple Kafka partitions and Flink parallelism for ordered streams
Date Wed, 11 Oct 2017 12:52:34 GMT

Currently we need 75 Kafka partitions per topic and a parallelism of 75 to
meet required performance, increasing the partitions and parallelism gives
diminished returns

Currently the performance is approx. 1500 msg/s per core, having one
pipeline (source, map, sink) deployed as one instance per core.

The Kafka source performance is not an issue. The map is very heavy
(deserialization, validation) on rather complex Avro messages. Object reuse
is enabled.

Ideally we would like to decouple Flink processing parallelism from Kafka
partitions in a following manner:

   - Pick a source parallelism
   - Per source, be able to pick a parallelism for the following map
   - In such a way that some message key determines which -local- map
   instance gets a message from a certain visitor
   - So that messages with the same visitor key get processed by the same
   map and in order for that visitor
   - Output the result to Kafka

AFAIK keyBy, partitionCustom will distribute messages over the network and
rescale has no affinity for message identity.

Am I missing something obvious?



View raw message