flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Finding the average temperature
Date Wed, 17 Feb 2016 18:14:34 GMT
the name pre-aggregation is a bit misleading. I have started calling it incremental aggregation
because it does not work like a combiner.

What it does is to incrementally fold (or reduce) elements as they arrive at the window operator.
This reduces the amount of required space, because, otherwise, all the elements would have
to be stored before the window is triggered. When using an incremental fold (or reduce) the
WindowFunction only get’s the one final result of the incremental aggregation.

> On 17 Feb 2016, at 09:27, Stefano Baghino <stefano.baghino@radicalbit.io> wrote:
> Hi Nirmalaya,
> my reply was based on me misreading your original post, thinking you had a batch of data,
not a stream. I see that the apply method can also take a reducer the pre-aggregates your
data before passing it to the window function. I suspect that pre-aggregation runs locally
just like a combiner would, but I'm really not sure about it. We should have more feedback
on this regard.
> On Tue, Feb 16, 2016 at 2:19 AM, Nirmalya Sengupta <sengupta.nirmalya@gmail.com>
> Hello Stefano <stefano.baghino@radicalbit.io>
> Sorry for the late reply. Many thanks for taking effort to write and share an example
code snippet.
> I have been playing with the countWindow behaviour for some weeks now and I am generally
aware of the functionality of countWindowAll(). For my useCase, where I _have to observe_
the entire stream as it founts in, using countWindowAll() is probably the most obvious solution.
This is what you recommend too. However, because this is going to use 1 thread only (or 1
node only in a cluster), I was thinking about ways to make use of the 'distributedness' of
the framework. Hence, my question.
> Your reply leads to me read and think a bit more. If I have to use parallelism to achieve
what I want to achieve, I think managing a ValueState of my own is possibly the solution.
If you have any other thoughts, please share. 
> From your  earlier response: '... you can still enjoy a high level of parallelism up
until the last operator by using a combiner, which is basically a reducer that operates locally
...'. Could you elaborate this a bit, whenever you have time?
> -- Nirmalya
> -- 
> Software Technologist
> http://www.linkedin.com/in/nirmalyasengupta
> "If you have built castles in the air, your work need not be lost. That is where they
should be.
> Now put the foundation under them."
> -- 
> BR,
> Stefano Baghino
> Software Engineer @ Radicalbit

View raw message