flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From M Singh <mans2si...@yahoo.com>
Subject Re: Flink - Handling late events - main vs late window firing
Date Mon, 14 Aug 2017 20:53:37 GMT
Thanks Aljoscha for your response.
Just to clarify - the only way to handle the duplication scenario properly is by using the
ProcessWindowFunction - there is no high level function for this.

Thanks again. 

    On Wednesday, August 9, 2017 6:26 AM, Aljoscha Krettek <aljoscha@apache.org> wrote:

1. You could use a ProcessWindowFunction instead of a WindowFunction. In there, you can query
the current watermark and thus determine why the firing is happening. Also, in a ProcessWindowFunction
you can keep per-window state, this would allow you to keep a bit of state that can tell you
whether this is the first firing for a given window or the number of firings so far.
2. This depends on whether the Trigger is purging or not. The default EventTimeTrigger is
not purging, meaning that all elements in the window will be preserved after firing (until
the watermark reaches the end of the window plus the allowed lateness). You can turn this
into a purging trigger using PurgingTrigger.of(EventTimeTrigger.create()). You would specify
this using .trigger() on WindowedStream when constructing your windowed operation.
3. It doesn't, you have to manually keep state in a ProcessWindowFunction to distinguish between
different cases, as mentioned above.
4. Currently, I think there are no examples because this depends to a large degree on the
specifics of the application. I'm afraid.

On 6. Aug 2017, at 18:45, M Singh <mans2singh@yahoo.com> wrote:
Hi Folks:
I am going through flink documentation and it states the following:
"You should be aware that the elements emitted by a late firing should be treated as updated
results of a previous computation, i.e., your data stream will contain multiple results for
the same computation. Depending on your application, you need to take these duplicated results
into account or deduplicate them."

I wanted to find out the following:
1. How do we distinguish the late firing from the main firing ?2. Does the late firing including
all events or only late events ?3. How does the late vs main firing affect the associated
window function ?
4. Are there any examples of how to handle these events and deduplication mentioned in the
documentation ?
Thanks for your help.

View raw message