Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9F476200BBF for ; Mon, 14 Nov 2016 21:54:10 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 9DD4A160B06; Mon, 14 Nov 2016 20:54:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E7FB7160AF4 for ; Mon, 14 Nov 2016 21:54:09 +0100 (CET) Received: (qmail 49296 invoked by uid 500); 14 Nov 2016 20:54:08 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 49276 invoked by uid 99); 14 Nov 2016 20:54:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2016 20:54:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 59FF8C7473 for ; Mon, 14 Nov 2016 20:54:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.7 X-Spam-Level: X-Spam-Status: No, score=-3.7 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-2.999, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id D29VdvuTtrxv for ; Mon, 14 Nov 2016 20:54:06 +0000 (UTC) Received: from posta.szn.cz (firma-smtp1.seznam.cz [77.75.74.246]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E1E495FC52 for ; Mon, 14 Nov 2016 20:54:05 +0000 (UTC) Received: from [172.16.1.141] (88.101.55.1) by Exchange1.kancelar.seznam.cz (10.0.3.23) with Microsoft SMTP Server (TLS) id 15.1.396.30; Mon, 14 Nov 2016 21:53:37 +0100 Subject: Re: WindowOperator - element's timestamp To: Aljoscha Krettek , References: <57e18382-381d-e5a4-477a-1a0a5df32f02@firma.seznam.cz> CC: Kostas Kloudas From: Petr Novotnik Message-ID: Date: Mon, 14 Nov 2016 21:53:36 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [88.101.55.1] X-ClientProxiedBy: Exchange1.kancelar.seznam.cz (10.0.3.23) To Exchange1.kancelar.seznam.cz (10.0.3.23) archived-at: Mon, 14 Nov 2016 20:54:10 -0000 Aljoscha, thanks for your response. The use-case I'm after is basically providing "early" (inaccurate) results to downstream consumers. Suppose we're running aggregations for daily time windows, but we don't want to wait a whole day to see results. The idea is to fire the windows continuously before they hit their end of life (at which point they fill be fired_and_purged and will provide the final, accurate answer.) However, if all of these "early" fired panes emit elements with a timestamp equaling the end-of-the-window, stateful downstream operators a) have no chance distinguishing between the different panes of the same window b) and don't have any chance to set-up timers before the watermark at the downstream operator advances to the "end of the day". Hope this clarifies my motivation a bit, P. On 11/14/2016 03:22 PM, Aljoscha Krettek wrote: > Hi, > I'm afraid the ContinuousEventTimeTrigger is a bit misleading and should > probably be removed. The problem is that a watermark T signals that we > won't see elements with a timestamp < T in the future. It does not > signal that we haven't already seen elements with a timestamp > T. So > this cannot be used to trigger at different stages of a given window. > > Do you have a concrete use case in mind for which you wanted to use > ContinuousEventTimeTrigger? > > Cheers, > Aljoscha > > On Mon, 14 Nov 2016 at 09:58 Ufuk Celebi > wrote: > > Looping in Kostas and Aljoscha who should know what's the expected > behaviour here ;) > > > On 11 November 2016 at 16:17:23, Petr Novotnik > (petr.novotnik@firma.seznam.cz > ) wrote: > > Hello, > > > > I'm struggling to understand the following behaviour of the > > `WindowOperator` and would appreciate some insight from experts: > > > > In particular I'm thinking about the following hypothetical data flow: > > > > input.keyBy(..) > > .window(TumblingEventTimeWindows.of(..)) > > .apply(..) > > ... > > .keyBy(..) > > .window(CustomWindowAssignerUsingATriggerBasedOnTheElementsStamp) > > .apply(..) > > > > When the first window operator fires a window based on the timer, the > > emitted elements are assigned a timestamp which equals > > `window.maxTimestamp()`. This stamp is then available in the second > > window operator's trigger through the `onElement` method. So far > so good. > > > > However, when using `ContinuousEventTimeTrigger` (simply put when > firing > > the window multiple times at different times in its lifecycle) in the > > first window operator, _all_ of the elements of this window - no > matter > > whether fired as a partial or the final window result - will > arrive with > > the same stamp in the (downstream) operators. > > > > This make it practically impossible to use again > > `ContinuousEventTimeTrigger` (or similar) in the second window > operator > > to achieve "early firing" again. > > > > This is surprising. I would expect the elements to be assigned the > stamp > > of the timer which fired them (which will be window#maxTimestamp() for > > `TumblingEventTimeWindows`). Is there any particular reason for the > > unconditional assignment to `window.maxTimestamp()`? > > > > Many thanks in advance, > > P. > > >