flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xingcan <xingc...@gmail.com>
Subject Re: About delta awareness caches
Date Thu, 22 Dec 2016 06:13:42 GMT
Hi Aljoscha,

First of all, sorry for that I missed your prompt reply : (

In these days, I've been learning the implementation mechanism of window in
Flink.

I think the main difference between the window in Storm and Flink (from the
API level) is that, Storm maintains only one window while Flink maintains
several isolated windows. Due to that, Storm users can be aware of the
transformation (tuple add and expire) of a window and take actions on each
window modification (sliding window forwarding) while Flink users can only
implement functions on one and another complete window as if they are
independent of each other (actually they may get quite a few tuples in
common).

Objectively speaking, the window API provided by Flink is more formalize
and easy to use. However, for sliding window with high-capacity and short
interval (e.g. 5m and 1s), each tuple will be calculated redundantly (maybe
300 times in the example?). Though it provide the pane optimization, I
think it's far from enough as the optimization can only be applied on
reduce functions which restrict the input and output data type to be the
same. Some other functions, e.g., the MaxAndMin function which take numbers
as input and output a max&min pair and the Average function, which should
avoid redundant calculations can not be satisfied.

Actually, I just wondering if a "mergeable fold function" could be added
(just *like* this https://en.wikipedia.org/wiki/Mergeable_heap). I know it
may violate some principles of Flink (probably about states), but I insist
that unnecessary calculations should be avoided in stream processing.

So, could you give some advices, I am all ears : ), or if you think that
is feasible, I'll think carefully and try to complete it.

Thank you and merry Christmas.

Best,

- Xingcan

On Thu, Dec 1, 2016 at 7:56 PM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> I'm not aware of how windows work in Storm. If you could maybe give some
> details on your use case we could figure out together how that would map to
> Flink windows.
>
> Cheers,
> Aljoscha
>
> On Tue, 29 Nov 2016 at 15:47 xingcan <xingcanc@gmail.com> wrote:
>
>> Hi all,
>>
>> Recently I tried to transfer some old applications from Storm to Flink.
>> In Storm, the window implementation (TupleWindow) gets two methods named
>> getNew() and getExpired() which supply the delta information of a window
>> and therefore we wrote some stateful caches that are aware of them.
>> However, it seems that Flink deals with the window in a different way and
>> supplies more "formalized" APIs.
>> So, any tips on how to adapt these delta awareness caches in Flink or do
>> some refactors to make them suitable?
>>
>> Thanks.
>>
>> Best,
>> Xingcan
>>
>>

Mime
View raw message