flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <k.klou...@data-artisans.com>
Subject Re: Event processing time with lateness
Date Fri, 03 Jun 2016 13:47:41 GMT
Hi Igor,

To handle late events in Flink you would have to implement you own custom trigger.

To see a relatively more complex example of such a trigger and how to implement it,
you can have a look at this implementation: https://github.com/dataArtisans/beam_comp/blob/master/src/main/java/com/dataartisans/beam_comparison/customTriggers/EventTimeTriggerWithEarlyAndLateFiring.java

Which implements the trigger described in this article (before the conclusions section)
http://data-artisans.com/why-apache-beam/ <http://data-artisans.com/why-apache-beam/>


> On Jun 3, 2016, at 2:55 PM, Igor Berman <igor.berman@gmail.com> wrote:
> Hi 
> according to presentation of Tyler Akidau https://docs.google.com/presentation/d/13YZy2trPugC8Zr9M8_TfSApSCZBGUDZdzi-WUz95JJw/present
Flink supports late arrivals for window processing, while I've seen several question in the
userlist regarding late arrivals and answer was - sort of "not for all usecases"
> Can somebody clarify?
> The interesting case for me - I have event processing time, while I want to aggregate
by tumbling window. The events come from kafka and might be late. Currently we define lateness
threshold with watermark (e.g. 5 mins)
> After window triggers I want to save aggregated result at some persistent storage(redis/hbase)
with start timestamp of window
> After this grace period - if I understand correctly - any event won't be aggregated into
existing window, but rather the trigger will call aggregated function with only 1 element
inside(the late one)
> so if my window method saves into persistent storage - it will override aggregated result
with new one that has only 1 element inside
> what I want to achieve - is that late arrival will trigger window method with all elements
(late + all other) so that aggregated result will be complete
> you can think about use case of page visits counts per minute, while due to some problems
page visit events might arrive late
> thanks in advance

View raw message