flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: get start and end time stamp from time window
Date Thu, 12 May 2016 09:59:17 GMT
Hi Martin,

You can use a FoldFunction and a WindowFunction to process the same!
window. The FoldFunction is eagerly applied, so the window state is only
one element. When the window is closed, the aggregated element is given to
the WindowFunction where you can add start and end time. The iterator of
the WindowFunction will provide only one (the aggregated) element.

See the apply method on WindowedStream with the following signature:
apply(initialValue: R, foldFunction: FoldFunction[T, R], function:
WindowFunction[R, R, K, W]): DataStream[R]

Cheers, Fabian

2016-05-11 20:16 GMT+02:00 Martin Neumann <mneumann@sics.se>:

> Hej,
> I have a windowed stream and I want to run a (generic) fold function on
> it. The result should have the start and the end time stamp of the window
> as fields (so I can relate it to the original data). *Is there a simple
> way to get the timestamps from within the fold function?*
> I could find the lowest and the highest ts as part of the fold function
> but that would not be very accurate especially when I the number of events
> in the window is low. Also, I want to write in a generic way so I can use
> it even if the data itself does not contain a time stamp field (running on
> processing time).
> I have looked into using a WindowFunction where I would have access to the
> start and end timestamp. I have not quite figured out how I would implement
> a fold function using this. Also, from my understanding this approach would
> require holding the whole window in memory which is not a good option since
> the window data can get very large.
> Is there a better way of doing this
> cheers Martin

View raw message