flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream
Date Tue, 11 Oct 2016 13:03:31 GMT
are you windowing based on event time?


On Fri, 7 Oct 2016 at 09:28 Fabian Hueske <fhueske@gmail.com> wrote:

> If you are using time windows, you can access the TimeWindow parameter of
> the WindowFunction.apply() method.
> The TimeWindow contains the start and end timestamp of a window (as Long)
> which can act as keys.
> If you are using count windows, I think you have to use a counter as you
> described.
> 2016-10-07 1:06 GMT+02:00 AJ Heller <aj@drfloob.com>:
> Thank you Fabian, I think that solves it. I'll need to rig up some tests
> to verify, but it looks good.
> I used a RichMapFunction to assign ids incrementally to windows (mapping
> STREAM_OBJECT to Tuple2<Long, STREAM_OBJECT> using a private long value in
> the mapper that increments on every map call). It works, but by any chance
> is there a more succinct way to do it?
> On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske <fhueske@gmail.com> wrote:
> Maybe this can be done by assigning the same window id to each of the N
> local windows, and do a
> .keyBy(windowId)
> .countWindow(N)
> This should create a new global window for each window id and collect all
> N windows.
> Best, Fabian
> 2016-10-06 22:39 GMT+02:00 AJ Heller <aj@drfloob.com>:
> The goal is:
>  * to split data, random-uniformly, across N nodes,
>  * window the data identically on each node,
>  * transform the windows locally on each node, and
>  * merge the N parallel windows into a global window stream, such that one
> window from each parallel process is merged into a "global window" aggregate
> I've achieved all but the last bullet point, merging one window from each
> partition into a globally-aggregated window output stream.
> To be clear, a rolling reduce won't work because it would aggregate over
> all previous windows in all partitioned streams, and I only need to
> aggregate over one window from each partition at a time.
> Similarly for a fold.
> The closest I have found is ParallelMerge for ConnectedStreams, but I have
> not found a way to apply it to this problem. Can flink achieve this? If so,
> I'd greatly appreciate a point in the right direction.
> Cheers,
> -aj

View raw message