flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Kruber <n...@data-artisans.com>
Subject Re: Kafka watermarks
Date Tue, 20 Jun 2017 16:36:18 GMT
according to the javadoc of 

"Specifies an {@link AssignerWithPunctuatedWatermarks} to emit watermarks in a 
punctuated manner. The watermark extractor will run per Kafka partition, 
watermarks will be merged across partitions in the same way as in the Flink 
runtime, when streams are merged.

When a subtask of a FlinkKafkaConsumer source reads multiple Kafka partitions, 
the streams from the partitions are unioned in a "first come first serve" 
fashion. Per-partition characteristics are usually lost that way. For example, 
if the timestamps are strictly ascending per Kafka partition, they will not be 
strictly ascending in the resulting Flink DataStream, if the parallel source 
subtask reads more that one partition.

Running timestamp extractors / watermark generators directly inside the Kafka 
source, per Kafka partition, allows users to let them exploit the per-
partition characteristics."

Thus, if you can leverage Kafka per-partition characteristics, do it there, 
otherwise it probably does not matter.


On Tuesday, 20 June 2017 17:46:23 CEST nragon wrote:
> So, in order to work with event time I have to options, inside kafka
> consumer or after kafka consumer.
> The first I can use:
> FlinkKafkaConsumer09<DataParameterMap> consumer.....
> consumer. assignTimestampsAndWatermarks()
> The other option:
> FlinkKafkaConsumer09<DataParameterMap> consumer.....
> DataStream dataStream =env.addSource(consumer); dataStream.
> assignTimestampsAndWatermarks()
> Any recommendation?
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-w
> atermarks-tp13849p13872.html Sent from the Apache Flink User Mailing List
> archive. mailing list archive at Nabble.com.

View raw message