flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Writing groups of Windows to files
Date Tue, 04 Jul 2017 09:24:38 GMT
Hi Fabian,

On Fri, Jun 30, 2017 at 6:27 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> If I understand your use case correctly, you'd like to hold back all
> events of a session until it ends/timesout and then write all events out.
> So, instead of aggregating per session (the common use case), you'd just
> like to collect the event.

Yes, and I want to write the completed sessions into files. No aggregation
or filtering at all.
The idea is that our DataScience guys who want to analyze sessions have a
much easier task of knowing for certain that they have 'a set of complete

> I would implement a simple WindowFunction that just forwards all events
> that it receives from the iterator. Conceptually, the window will just
> collect the events and emit them when the session ended/timedout.
> Then you can add BucketingSink which writes out the events. I'm not sure
> if the BucketingSInk supports buckets based on event-time though. Maybe you
> would need to adapt it a bit to guarantee that all rows of the same session
> are written to the same file.
> Alternatively, the WindowFunction could also emit one large record which
> is a List or Array of events belonging to the same session.

That last one was the idea I had.
Have a window function that keeps the Window until finished, then output
that with the eventtime of the 'end of the session' and use the bucketing
sink to write those to disk.

The problem (in my mind) that I have with this is that a single session
with a LOT of events would bring the system to a halt because it can
trigger OOM events.

How should I handle those?


View raw message