flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Event processing time with lateness
Date Sat, 04 Jun 2016 14:05:54 GMT
Hi Igor,
you might be interested in this doc about how we want to improve handling
of late data and some other things in the windowing API:
https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing

I've sent it around several times but you can never know who's aware of it
already. :-)

Cheers,
Aljoscha

On Fri, 3 Jun 2016 at 22:02 Michael Tamillow <mikaeltamillow96@gmail.com>
wrote:

> Super cool stuff
>
> On Fri, Jun 3, 2016 at 10:55 AM, Kostas Kloudas <
> k.kloudas@data-artisans.com> wrote:
>
>> You are welcome!
>>
>>
>> On Jun 3, 2016, at 4:40 PM, Igor Berman <igor.berman@gmail.com> wrote:
>>
>> thanks Kosta
>>
>> On 3 June 2016 at 16:47, Kostas Kloudas <k.kloudas@data-artisans.com>
>> wrote:
>>
>>> Hi Igor,
>>>
>>> To handle late events in Flink you would have to implement you own
>>> custom trigger.
>>>
>>> To see a relatively more complex example of such a trigger and how to
>>> implement it,
>>> you can have a look at this implementation:
>>> https://github.com/dataArtisans/beam_comp/blob/master/src/main/java/com/dataartisans/beam_comparison/customTriggers/EventTimeTriggerWithEarlyAndLateFiring.java
>>>
>>> Which implements the trigger described in this article (before the
>>> conclusions section)
>>> http://data-artisans.com/why-apache-beam/
>>>
>>> Thanks,
>>> Kostas
>>>
>>> On Jun 3, 2016, at 2:55 PM, Igor Berman <igor.berman@gmail.com> wrote:
>>>
>>> Hi
>>>
>>> according to presentation of Tyler Akidau
>>> https://docs.google.com/presentation/d/13YZy2trPugC8Zr9M8_TfSApSCZBGUDZdzi-WUz95JJw/present
Flink
>>> supports late arrivals for window processing, while I've seen several
>>> question in the userlist regarding late arrivals and answer was - sort of
>>> "not for all usecases"
>>> Can somebody clarify?
>>>
>>> The interesting case for me - I have event processing time, while I want
>>> to aggregate by tumbling window. The events come from kafka and might be
>>> late. Currently we define lateness threshold with watermark (e.g. 5 mins)
>>>
>>> After window triggers I want to save aggregated result at some
>>> persistent storage(redis/hbase) with start timestamp of window
>>>
>>> After this grace period - if I understand correctly - any event won't be
>>> aggregated into existing window, but rather the trigger will call
>>> aggregated function with only 1 element inside(the late one)
>>>
>>> so if my window method saves into persistent storage - it will override
>>> aggregated result with new one that has only 1 element inside
>>>
>>> what I want to achieve - is that late arrival will trigger window method
>>> with all elements (late + all other) so that aggregated result will be
>>> complete
>>>
>>> you can think about use case of page visits counts per minute, while due
>>> to some problems page visit events might arrive late
>>>
>>> thanks in advance
>>>
>>>
>>>
>>>
>>
>>
>

Mime
View raw message