flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek R. Singh" <abhis...@tetrationanalytics.com>
Subject Re: custom sources
Date Thu, 19 May 2016 17:48:09 GMT
Thanks - appreciate the response.

The reason I want to control these things is this - my state grows and shrinks over time (user
level windowing as application state). I would like to trigger checkpoints just after the
state has been crunched/compressed (at the window boundary). Say I crunch every 10 seconds
and slide the window by 8 seconds (2 second overlap). My window buffers would only need to
checkpoint 2s worth of in-flight data (apart from compressed state for the other 8 seconds).
With flink this seems hard given the windowing is at a partition level and not a global window.
Even if I use event time, every partition will be at different points (of when that partition
becomes ready to crunch). OTOH, if I were to introduce barriers at source, I could ensure
that I get a good point globally to crunch and checkpoint my state.

Does the checkpoint co-ordinator provide triggers to application to “crunch now" and reduce
state? BTW, this may not be optimal, because applications would have natural triggers to crunch
windows (and can’t just react to these triggers at random points).

Is there any benefit of allowing flink to do the windowing (in terms of getting smaller checkpoints)?
I read that flink does not checkpoint in-flight data, but this would be impossible with event
time and out of order processing by operators (I can see how it would work with processing
time, and in order crunching). When the barrier hits the operator, flink will have to checkpoint
all active event time windows. Given these event time windows have different trigger points,
it might help to checkpoint right after trigger evaluation so the state is compressed (and
there is less things to checkpoint). 

There seems to be some relationship between watermarks, triggers and checkpoint that is someone
not being leveraged.


> On May 19, 2016, at 5:48 AM, Till Rohrmann <trohrmann@apache.org> wrote:
> Hi Abhishek,
> you can implement custom sources by implementing the SourceFunction or the ParallelSourceFunction
interface and then calling StreamExecutionEnvironment.addSource.
> At the moment, it is not possible to control manually or from a source function when
to trigger a checkpoint. This is the responsibility of the CheckpointCoordinator.
> Cheers,
> Till
> On Tue, May 17, 2016 at 8:28 PM, Abhishek R. Singh <abhishsi@tetrationanalytics.com
<mailto:abhishsi@tetrationanalytics.com>> wrote:
> Hi,
> Can we define custom sources in link? Control the barriers and (thus) checkpoints at
good watermark points?
> -Abhishek-

View raw message