flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lasse Nedergaard <lassenederga...@gmail.com>
Subject Re: Watermark for each key?
Date Wed, 24 Apr 2019 17:58:18 GMT
Thanks Till

What about this workaround. 
If I after the watermark assignment split the stream in elements that fits in the watermark
(s1) and those that don’t (s2). The s1 I process with the table api with a window aggregate
using watermark and s2 I handle with an unbounded non-windows aggregate with IdleStateRentionTime
so state are removed when my devices are up to date again. I then merge the two outputs and
continue. 
By doing this I handle 99% as standard and only keeping state for the late data. 

Make sense? And would it work?

Med venlig hilsen / Best regards
Lasse Nedergaard


> Den 24. apr. 2019 kl. 19.00 skrev Till Rohrmann <trohrmann@apache.org>:
> 
> Hi Lasse,
> 
> at the moment this is not supported out of the box by Flink. The community thought about
this feature but so far did not implement it. Unfortunately, I'm also not aware of an easy
workaround one could do in the user code space.
> 
> Cheers,
> Till
> 
>> On Wed, Apr 24, 2019 at 3:26 PM Lasse Nedergaard <lassenedergaard@gmail.com>
wrote:
>> Hi.
>> 
>> We work with IoT data and we have cases where the IoT-device delay data transfer
if it can't get network access. We would like to use table windows aggregate function over
each device to calculate some statistics, but for windows aggregate functions to work we need
to assign a watermark. This watermark is general for all devices. We can set allow latency,
but we can't set it to months. 
>> So what we need is to have a watermark for each device (key by) so the window aggregate
work on the timestamp delivered for the device and not the global watermark. 
>> Is that possible, or have anyone consider this feature?
>> 
>> Best 
>> 
>> Lasse Nedergaard

Mime
View raw message