flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: global watermark across multiple kafka consumers
Date Wed, 16 Dec 2015 10:06:07 GMT
Hi Andrew,

as far as I know, there is nothing such as a prescribed way of handling
this kind of situation. If you want to synchronize the watermark generation
given a set of KafkaConsumers you need some kind of ground truth.

This could be, for example, a central registry such as ZooKeeper in which
you collect the current watermarks of the different consumers. You could
access ZooKeeper from inside the TimestampExtractor.

Alternatively, however a bit more hacky, you could exploit that the
consumer tasks are usually colocated with consumer tasks from different
topics. This means that you'll have multiple subtasks reading from the
different Kafka topics running in the same JVM. You could then use class
variables to synchronize the watermarks. But this assumes that each subtask
reading the topic t from Kafka is colocated with at least one other subtask
reading the topic t' from Kafka with t' in T \ {t} and T being the set of
Kafka topics. Per default this should be the case.

I'm wondering why you need a global watermark for you Kafka topics. Isn't
it enough that you have individual watermarks for each topic?

Cheers,
Till

On Tue, Dec 15, 2015 at 4:45 PM, Griess, Andrew <andrew.griess@sap.com>
wrote:

> Hi guys,
>
> I have a question related to utilizing watermarks with multiple
> FlinkKakfkaConsumer082 instances. The aim is to have a global watermark
> across multiple kafka consumers where any message from any kafka partition
> would update the same watermark. When testing a simple TimeStampExtractor
> implementation it seems each consumer results in a separate watermark. Is
> there a prescribed way of handling such a thing that anyone has any
> experience with?
>
> Thanks for your help,
>
> Andrew Griess
>
>

Mime
View raw message