Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D54CD18213 for ; Tue, 24 Nov 2015 20:32:54 +0000 (UTC) Received: (qmail 95262 invoked by uid 500); 24 Nov 2015 20:32:54 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 95180 invoked by uid 500); 24 Nov 2015 20:32:54 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 95169 invoked by uid 99); 24 Nov 2015 20:32:54 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2015 20:32:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 179471A0B21 for ; Tue, 24 Nov 2015 20:32:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id JfHZbSwOTNHH for ; Tue, 24 Nov 2015 20:32:52 +0000 (UTC) Received: from mail-lf0-f49.google.com (mail-lf0-f49.google.com [209.85.215.49]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id AC10D22F0A for ; Tue, 24 Nov 2015 20:32:51 +0000 (UTC) Received: by lfaz4 with SMTP id z4so36205339lfa.0 for ; Tue, 24 Nov 2015 12:32:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=BjOzjU4wwHAVAa7PlFgG0SstgzKwEPeNl6bNmRwXdI8=; b=jOW2OU4kJQJFdiRHGjp+L5bEPihLD20Dak72NqQmqJ9Eux9eGfjhglt+P6J0Gcrivd wbM165iU8jLi8rIdo0GTgc+2WbMoB+liLCE1277U59uTvBOVd76CevhDMfWTODSai8uh D16LwfW9JVbdqtQG4JxIuJGlVzZ67Pgr2Y4mUddNZl2rKjSP2/8hl83MDCkHy5+dSzdT 2xGAA+ZK8KopFWoFpt2A6a1HJZ2TE7EEPIOkeocwv/TUKjB9g5dcZwvk6ZTbmg2NPe6n LSfKQ3KFX17w9iogpWu1e5J8SIISgKC7HxKYuQPDDXypY0YK/TnAY1kr1FOKXpl0shLv M7kg== X-Received: by 10.25.43.146 with SMTP id r140mr11366013lfr.140.1448397169938; Tue, 24 Nov 2015 12:32:49 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.44.140 with HTTP; Tue, 24 Nov 2015 12:32:30 -0800 (PST) In-Reply-To: References: <9D1E46EF-8B9B-4689-9714-32D98090CEC8@gmail.com> From: Anton Polyakov Date: Tue, 24 Nov 2015 21:32:30 +0100 Message-ID: Subject: Re: Watermarks as "process completion" flags To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a1141170cced93f05254f3e2c --001a1141170cced93f05254f3e2c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Max thanks for reply. From what I understand window works in a way that it buffers records while window is open, then apply transformation once window close is triggered and pass transformed result. In my case then window will be open for few hours, then the whole amount of trades will be processed once window close is triggered. Actually I want to process events as they are produced without buffering them. It is more like a stream with some special mark versus windowing seems more like a batch (if I understand it correctly). In other words - buffering and waiting for window to close, then processing will be equal to simply doing one-off processing when all events are produced. I am looking for a solution when I am processing events as they are produced and when source signals "done" my processing is also nearly done. On Tue, Nov 24, 2015 at 2:41 PM, Maximilian Michels wrote: > Hi Anton, > > You should be able to model your problem using the Flink Streaming > API. The actions you want to perform on the streamed records > correspond to transformations on Windows. You can indeed use > Watermarks to signal the window that a threshold for an action has > been reached. Otherwise an eviction policy should also do it. > > Without more details about what you want to do I can only refer you to > the streaming API documentation: > Please see > https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streami= ng_guide.html > > Thanks, > Max > > On Sun, Nov 22, 2015 at 8:53 PM, Anton Polyakov > wrote: > > Hi > > > > I am very new to Flink and in fact never used it. My task (which I > currently solve using home grown Redis-based solution) is quite simple - = I > have a system which produces some events (trades, it is a financial syste= m) > and computational chain which computes some measure accumulatively over > these events. Those events form a long but finite stream, they are produc= ed > as a result of end of day flow. Computational logic forms a processing DA= G > which computes some measure over these events (VaR). Each trade is > processed through DAG and at different stages might produce different set > of subsequent events (like return vectors), eventually they all arrive in= to > some aggregator which computes accumulated measure (reducer). > > > > Ideally I would like to process trades as they appear (i.e. stream them= ) > and once producer reaches end of portfolio (there will be no more trades)= , > I need to write final resulting measure and mark it as =E2=80=9Cend of da= y record=E2=80=9D. > Of course I also could use a classical batch - i.e. wait until all trades > are produced and then batch process them, but this will be too inefficien= t. > > > > If I use Flink, I will need a sort of watermark saying - =E2=80=9Cdone,= no more > trades=E2=80=9D and once this watermark reaches end of DAG, final measure= can be > saved. More generally would be cool to have an indication at the end of D= AG > telling to which input stream position current measure corresponds. > > > > I feel my problem is very typical yet I can=E2=80=99t find any solution= . All > examples operate either on infinite streams where nobody cares about > completion or classical batch examples which rely on fact all input data = is > ready. > > > > Can you please hint me. > > > > Thank you vm > > Anton > --001a1141170cced93f05254f3e2c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Max

thanks for reply. From = what I understand window works in a way that it buffers records while windo= w is open, then apply transformation once window close is triggered and pas= s transformed result.
In my case then window will be open for few hours= , then the whole amount of trades will be processed once window close is tr= iggered. Actually I want to process events as they are produced without buf= fering them. It is more like a stream with some special mark versus windowi= ng seems more like a batch (if I understand it correctly).

In = other words - buffering and waiting for window to close, then processing wi= ll be equal to simply doing one-off processing when all events are produced= . I am looking for a solution when I am processing events as they are produ= ced and when source signals "done" my processing is also nearly d= one.


On Tue, Nov 24, 2015 at 2:41 PM, Maximilian Michels <mxm@apache.org> wrote:
Hi Anton,

You should be able to model your problem using the Flink Streaming
API. The actions you want to perform on the streamed records
correspond to transformations on Windows. You can indeed use
Watermarks to signal the window that a threshold for an action has
been reached. Otherwise an eviction policy should also do it.

Without more details about what you want to do I can only refer you to
the streaming API documentation:
Please see
htt= ps://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_gu= ide.html

Thanks,
Max

On Sun, Nov 22, 2015 at 8:53 PM, Anton Polyakov
<polyakov.anton@gmail.com> wrote:
> Hi
>
> I am very new to Flink and in fact never used it. My task (which I cur= rently solve using home grown Redis-based solution) is quite simple - I hav= e a system which produces some events (trades, it is a financial system) an= d computational chain which computes some measure accumulatively over these= events. Those events form a long but finite stream, they are produced as a= result of end of day flow. Computational logic forms a processing DAG whic= h computes some measure over these events (VaR). Each trade is processed th= rough DAG and at different stages might produce different set of subsequent= events (like return vectors), eventually they all arrive into some aggrega= tor which computes accumulated measure (reducer).
>
> Ideally I would like to process trades as they appear (i.e. stream the= m) and once producer reaches end of portfolio (there will be no more trades= ), I need to write final resulting measure and mark it as =E2=80=9Cend of d= ay record=E2=80=9D. Of course I also could use a classical batch - i.e. wai= t until all trades are produced and then batch process them, but this will = be too inefficient.
>
> If I use Flink, I will need a sort of watermark saying - =E2=80=9Cdone= , no more trades=E2=80=9D and once this watermark reaches end of DAG, final= measure can be saved. More generally would be cool to have an indication a= t the end of DAG telling to which input stream position current measure cor= responds.
>
> I feel my problem is very typical yet I can=E2=80=99t find any solutio= n. All examples operate either on infinite streams where nobody cares about= completion or classical batch examples which rely on fact all input data i= s ready.
>
> Can you please hint me.
>
> Thank you vm
> Anton

--001a1141170cced93f05254f3e2c--