flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Question about watermark and window
Date Mon, 28 Aug 2017 12:00:32 GMT
Hi Tony,

I think your analyses are correct. Especially, yes, if you re-read the data the (ts=3) data
should still be considered late if both consumers read with the same speed. If, however, (ts=3)
is read before the other consumer reads (ts=8) then it should not be considered late, as you


> On 24. Aug 2017, at 15:49, Tony Wei <tony19920430@gmail.com> wrote:
> Hi,
> Recently, I studied about watermark from Flink documents and blogs.
> I have some question about this scenario below.
> Suppose there are five clients sending events with different time to the topic on Kafka.
> Topic has two partitions and five events' timestamp are (ts=1), (ts=2), (ts=3), (ts=8),
> The Flink streaming job uses the following setting:
> 1. use AscendingTimestampExtractor
> 2. client time as timestamp
> 3. use tumbling window with 5 unit window size
> 4. not allow late event
> If the client events out of order like this.
>   Partition A [(ts=1), (ts=8)]
>   Partition B [(ts=2), (ts=9)]  <= (ts=3) delay
> Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in state and drop
out (ts=3) ?
> If all events has come, and then replay the job from the beginning, the partition state
would be
>   Partition A [(ts=1), (ts=8)]
>   Partition B [(ts=2), (ts=9), (ts=3)]
> Suppose two consumers fetch events with same speed, should the result be the same as
> If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would (ts=3) be placed
in the window before watermark becomes to 8 and then emit [(ts=1), (ts=2), (ts=3)] as result?
> I wonder if those questions are all correct. If not, is there any mechanisms about watermark
and window in Flink that I missed.
> Thank for your help.
> Best Regards,
> Tony Wei

View raw message