flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: TimeWindow overload?
Date Tue, 03 May 2016 12:42:43 GMT
Just had a quick chat with Aljoscha...

The first version of the aligned window code will still duplicate the
elements, later versions should be able to get rid of that.

On Tue, May 3, 2016 at 11:10 AM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> Hi,
> even with the optimized operator for aligned time windows I would advice
> against using long sliding windows with a small slide. The system will
> internally create a lot of "buckets", i.e. each sliding window is treated
> separately and the element is put into 1,440 buckets, in your case. With a
> moderate amount of different keys this can very quickly lead to a lot of
> created window buckets. You can think of it in terms of write
> amplification. If you have tumbling windows you basically have no
> amplification, if you have sliding windows you have window processing
> overhead for every slide.
>
> Cheers,
> Aljoscha
>
> On Tue, 3 May 2016 at 09:05 Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi Elias!
>>
>> There is a feature pending that uses an optimized version for aligned
>> time windows. In that case, elements would go into a single window pane,
>> and the full window would be composed of all panes it spans (in the case of
>> sliding windows). That should help a lot in those cases.
>>
>> The default window mechanism does it that way, because is supports
>> unaligned windows (where each key has a different window start and
>> endpoint) and it supports completely custom window assigners.
>>
>> Greetings,
>> Stephan
>>
>>
>>
>> On Tue, May 3, 2016 at 4:07 AM, Elias Levy <fearsome.lucidity@gmail.com>
>> wrote:
>>
>>> Looking over the code, I see that Flink creates a TimeWindow object each
>>> time the WindowAssigner is created.  I have not yet tested this, but I am
>>> wondering if this can become problematic if you have a very long sliding
>>> window with a small slide, such as a 24 hour window with a 1 minute slide.
>>> It seems this would create 1,440 TimeWindow objects per event.  Event a low
>>> event rates this would seem to result in an explosion of TimeWindow
>>> objects: at 1,000 events per second, you'd be creating 1,440,000 TImeWindow
>>> objects.  After 24 hours you'd have nearly 125 billion TM objects that
>>> would just begin to be purged.
>>>
>>> Does this analysis seem right?
>>>
>>> I suppose that means you should not use long length sliding window with
>>> small slides.
>>>
>>>
>>

Mime
View raw message