flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Polyakov <polyakov.an...@gmail.com>
Subject Re: Watermarks as "process completion" flags
Date Tue, 24 Nov 2015 20:32:30 GMT
Hi Max

thanks for reply. From what I understand window works in a way that it
buffers records while window is open, then apply transformation once window
close is triggered and pass transformed result.
In my case then window will be open for few hours, then the whole amount of
trades will be processed once window close is triggered. Actually I want to
process events as they are produced without buffering them. It is more like
a stream with some special mark versus windowing seems more like a batch
(if I understand it correctly).

In other words - buffering and waiting for window to close, then processing
will be equal to simply doing one-off processing when all events are
produced. I am looking for a solution when I am processing events as they
are produced and when source signals "done" my processing is also nearly
done.


On Tue, Nov 24, 2015 at 2:41 PM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Anton,
>
> You should be able to model your problem using the Flink Streaming
> API. The actions you want to perform on the streamed records
> correspond to transformations on Windows. You can indeed use
> Watermarks to signal the window that a threshold for an action has
> been reached. Otherwise an eviction policy should also do it.
>
> Without more details about what you want to do I can only refer you to
> the streaming API documentation:
> Please see
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html
>
> Thanks,
> Max
>
> On Sun, Nov 22, 2015 at 8:53 PM, Anton Polyakov
> <polyakov.anton@gmail.com> wrote:
> > Hi
> >
> > I am very new to Flink and in fact never used it. My task (which I
> currently solve using home grown Redis-based solution) is quite simple - I
> have a system which produces some events (trades, it is a financial system)
> and computational chain which computes some measure accumulatively over
> these events. Those events form a long but finite stream, they are produced
> as a result of end of day flow. Computational logic forms a processing DAG
> which computes some measure over these events (VaR). Each trade is
> processed through DAG and at different stages might produce different set
> of subsequent events (like return vectors), eventually they all arrive into
> some aggregator which computes accumulated measure (reducer).
> >
> > Ideally I would like to process trades as they appear (i.e. stream them)
> and once producer reaches end of portfolio (there will be no more trades),
> I need to write final resulting measure and mark it as “end of day record”.
> Of course I also could use a classical batch - i.e. wait until all trades
> are produced and then batch process them, but this will be too inefficient.
> >
> > If I use Flink, I will need a sort of watermark saying - “done, no more
> trades” and once this watermark reaches end of DAG, final measure can be
> saved. More generally would be cool to have an indication at the end of DAG
> telling to which input stream position current measure corresponds.
> >
> > I feel my problem is very typical yet I can’t find any solution. All
> examples operate either on infinite streams where nobody cares about
> completion or classical batch examples which rely on fact all input data is
> ready.
> >
> > Can you please hint me.
> >
> > Thank you vm
> > Anton
>

Mime
View raw message