flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt <dromitl...@gmail.com>
Subject Re: Updating a Tumbling Window every second?
Date Mon, 19 Dec 2016 20:43:33 GMT
Fabian,

Thanks for your answer. Since elements in B are expensive to create, I
wanted to reuse them. I understand I can plug two consumers into stream A,
but in that case -if I'm not wrong- I would have to create repeated
elements of B: one to save them into stream B and one to create C objects
for stream C.

Anyway, I've already solved this problem a few days back.

Regards,
Matt

On Mon, Dec 19, 2016 at 5:57 AM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Matt,
>
> the combination of a tumbling time window and a count window is one way to
> define a sliding window.
> In your example of a 30 secs tumbling window and a (3,1) count window
> results in a time sliding window of 90 secs width and 30 secs slide.
>
> You could define a time sliding window of 90 secs width and 1 secs slide
> on stream A to get a stream C with faster updates.
> If you still need stream B with the 30 secs tumbling window, you can have
> both windows defined on stream A.
>
> Hope this helps,
> Fabian
>
> 2016-12-16 12:58 GMT+01:00 Matt <dromitlabs@gmail.com>:
>
>> I have reduced the problem to a simple image [1].
>>
>> Those shown on the image are the streams I have, and the problem now is
>> how to create a custom window assigner such that objects in B that *don't
>> share* elements in A, are put together in the same window.
>>
>> Why? Because in order to create elements in C (triangles), I have to
>> process n *independent* elements of B (n=2 in the example).
>>
>> Maybe there's a better or simpler way to do this. Any idea is appreciated!
>>
>> Regards,
>> Matt
>>
>> [1] http://i.imgur.com/dG5AkJy.png
>>
>> On Thu, Dec 15, 2016 at 3:22 AM, Matt <dromitlabs@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have a rather simple problem with a difficult explanation...
>>>
>>> I have 3 streams, one of objects of class A (stream A), one of class B
>>> (stream B) and one of class C (stream C). The elements of A are generated
>>> at a rate of about 3 times every second. Elements of type B encapsulates
>>> some key features of the stream A (like the number of elements of A in the
>>> window) during the last 30 seconds (tumbling window 30s). Finally, the
>>> elements of type C contains statistics (for simplicity let's say the
>>> average of elements processed by each element in B) of the last 3 elements
>>> in B and are produced on every new element of B (count window 3, 1).
>>>
>>> Illustrative example, () and [] denotes windows:
>>>
>>> ... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
>>> ... (b4 [b3 b2) b1]
>>> ... [c2] [c1]
>>>
>>> This works fine, except for a dashboard that depends on the elements of
>>> C to be updated, and 30s is way too big of a delay. I thought I could
>>> change the tumbling window for a sliding window of size 30s and a slide of
>>> 1s, but this doesn't work.
>>>
>>> If I use a sliding window to create elements of B as mentioned, each
>>> count window would contain 3 elements of B, and I would get one element of
>>> C every second as intended, but those elements in B encapsulates almost the
>>> same elements of A. This results in stats that are wrong.
>>>
>>> For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and
>>> b3 share most of the elements from stream A.
>>>
>>> Question: is there any way to create a count window with the last 3
>>> elements of B that would have gone into the same tumbling window, not with
>>> the last 3 consecutive elements?
>>>
>>> I hope the problem is clear, don't hesitate to ask for further
>>> clarification!
>>>
>>> Regards,
>>> Matt
>>>
>>>
>>
>

Mime
View raw message