Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1C685200CAD for ; Wed, 28 Jun 2017 12:32:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1B271160BF7; Wed, 28 Jun 2017 10:32:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 66F24160BE8 for ; Wed, 28 Jun 2017 12:32:05 +0200 (CEST) Received: (qmail 96062 invoked by uid 500); 28 Jun 2017 10:32:04 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 96052 invoked by uid 99); 28 Jun 2017 10:32:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jun 2017 10:32:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 202951802CA for ; Wed, 28 Jun 2017 10:32:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Dgw7DzIN5ASd for ; Wed, 28 Jun 2017 10:32:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 80D065F295 for ; Wed, 28 Jun 2017 10:32:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4DF15E0D76 for ; Wed, 28 Jun 2017 10:32:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 729C82418D for ; Wed, 28 Jun 2017 10:32:00 +0000 (UTC) Date: Wed, 28 Jun 2017 10:32:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 28 Jun 2017 10:32:06 -0000 [ https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066306#comment-16066306 ] ASF GitHub Bot commented on FLINK-6969: --------------------------------------- Github user wuchong commented on the issue: https://github.com/apache/flink/pull/4183 Hi @fhueske , jincheng and me have discussed offline and have an unified opinion. We think the approach of window(with custom trigger) + custom operator will increase the latency when multiple window aggregate is applied. For example, defer 1 hour to compute with two level window aggregates. The output and watermark of the first level window have been deferred 1 hour. In fact, the output is in order now. However, the second window will still defer 1 hour to compute which result in the result is delayed for two hours. We think we only need to hold the watermark back at source, then all the downstream window operators will defer the given offset to compute. And the end-to-end latency is the given offset. Therefor, we propose to add a custom operator (to offset the watermark) after the source (the DataStream in the physical plan tree root). And no custom trigger needed. What do you think? > Add support for deferred computation for group window aggregates > ---------------------------------------------------------------- > > Key: FLINK-6969 > URL: https://issues.apache.org/jira/browse/FLINK-6969 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL > Reporter: Fabian Hueske > Assignee: sunjincheng > > Deferred computation is a strategy to deal with late arriving data and avoid updates of previous results. Instead of computing a result as soon as it is possible (i.e., when a corresponding watermark was received), deferred computation adds a configurable amount of slack time in which late data is accepted before the result is compute. For example, instead of computing a tumbling window of 1 hour at each full hour, we can add a deferred computation interval of 15 minute to compute the result quarter past each full hour. > This approach adds latency but can reduce the number of update esp. in use cases where the user cannot influence the generation of watermarks. It is also useful if the data is emitted to a system that cannot update result (files or Kafka). The deferred computation interval should be configured via the {{QueryConfig}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)