Return-Path: X-Original-To: apmail-gearpump-dev-archive@minotaur.apache.org Delivered-To: apmail-gearpump-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DB5ED18008 for ; Sun, 10 Apr 2016 18:41:29 +0000 (UTC) Received: (qmail 90656 invoked by uid 500); 10 Apr 2016 18:41:29 -0000 Delivered-To: apmail-gearpump-dev-archive@gearpump.apache.org Received: (qmail 90622 invoked by uid 500); 10 Apr 2016 18:41:29 -0000 Mailing-List: contact dev-help@gearpump.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@gearpump.incubator.apache.org Delivered-To: mailing list dev@gearpump.incubator.apache.org Received: (qmail 90605 invoked by uid 99); 10 Apr 2016 18:41:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Apr 2016 18:41:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5BEFFC1FA5 for ; Sun, 10 Apr 2016 18:41:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.216 X-Spam-Level: X-Spam-Status: No, score=-4.216 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.996] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id wgdhHd1clKdV for ; Sun, 10 Apr 2016 18:41:27 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 657455F642 for ; Sun, 10 Apr 2016 18:41:26 +0000 (UTC) Received: (qmail 90453 invoked by uid 99); 10 Apr 2016 18:41:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Apr 2016 18:41:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 763822C1F5A for ; Sun, 10 Apr 2016 18:41:25 +0000 (UTC) Date: Sun, 10 Apr 2016 18:41:25 +0000 (UTC) From: "Kam Kasravi (JIRA)" To: dev@gearpump.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (GEARPUMP-23) Add DSL window (time series) support MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GEARPUMP-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234206#comment-15234206 ] Kam Kasravi edited comment on GEARPUMP-23 at 4/10/16 6:41 PM: -------------------------------------------------------------- Comments from [~mauzhang] moved from https://github.com/gearpump/gearpump/issues/1034 We only have `WindowState` now, which limits the usage of windowing to exactly-once processing. A full windowing support should be oblivious to whether it's at-most-once, at-least-once or exactly-once. Also, we need a window function in DSL API. Besides, [Google's Dataflow model | http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf] introduced the concept of event-time processing, unaligned window / session (while only aligned window is supported now) > Sessions are windows that capture some period of activity over a subset of the data, in this case per key. Typically they are defined by a timeout gap. Any events that occur within a span of time less than the timeout are grouped together as a session. Sessions are unaligned windows. and triggers over window for incremental processing as well as use cases that motivates such concepts ||use cases || requirements|| |session-based analysis for search, ads, analytics, social, and YouTube | unaligned windows | |billing pipelines | triggers and incremental processing support via accumulation and retraction | |aggregate statistics calculations | watermark triggers | |abuse detection pipelines | percentile watermark triggers | |recommendation generation | processing (system) time triggers | |anomaly detection | data driven triggers | It's nice for Gearpump to have these augments, and it shouldn't be hard for Gearpump to do so since event-time processing and watermark are already built-in. In summary, I'd like to implement the following features for windowing support, - [ ] window processor in Graph API for all message processing semantics - [ ] window function in DSL API for all message processing semantics - [ ] support unaligned sessions - [ ] trigger API that includes event-time triggers, watermark triggers and data driven triggers (not sure whether we need processing time triggers) was (Author: kam kasravi): Comments from [~mauzhang] moved from https://github.com/gearpump/gearpump/issues/1034 We only have `WindowState` now, which limits the usage of windowing to exactly-once processing. A full windowing support should be oblivious to whether it's at-most-once, at-least-once or exactly-once. Also, we need a window function in DSL API. Besides, [Google's Dataflow model](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf) introduced the concept of event-time processing, unaligned window / session (while only aligned window is supported now) > Sessions are windows that capture some period of activity over a subset of the data, in this case per key. Typically they are defined by a timeout gap. Any events that occur within a span of time less than the timeout are grouped together as a session. Sessions are unaligned windows. and triggers over window for incremental processing as well as use cases that motivates such concepts use cases | requirements -------------- | ------------------- session-based analysis for search, ads, analytics, social, and YouTube | unaligned windows billing pipelines | triggers and incremental processing support via accumulation and retraction aggregate statistics calculations | watermark triggers abuse detection pipelines | percentile watermark triggers recommendation generation | processing (system) time triggers anomaly detection | data driven triggers It's nice for Gearpump to have these augments, and it shouldn't be hard for Gearpump to do so since event-time processing and watermark are already built-in. In summary, I'd like to implement the following features for windowing support, - [ ] window processor in Graph API for all message processing semantics - [ ] window function in DSL API for all message processing semantics - [ ] support unaligned sessions - [ ] trigger API that includes event-time triggers, watermark triggers and data driven triggers (not sure whether we need processing time triggers) > Add DSL window (time series) support > ------------------------------------- > > Key: GEARPUMP-23 > URL: https://issues.apache.org/jira/browse/GEARPUMP-23 > Project: Apache Gearpump > Issue Type: New Feature > Components: streaming > Affects Versions: 0.8.0 > Reporter: Kam Kasravi > Fix For: 0.8.1 > > > Time series support is need for both GEARPUMP-21 and GEARPUMP-22. This was originally github issue [1917|https://github.com/gearpump/gearpump/issues/1917]. Other realtime streaming engines support time windows such as [Flink|https://flink.apache.org/news/2015/12/04/Introducing-windows.html], [Beam|https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101] and a future release of [Spark|https://issues.apache.org/jira/secure/attachment/12793410/StructuredStreamingProgrammingAbstractionSemanticsandAPIs-ApacheJIRA.pdf]. A comparison matrix is found [here|https://docs.google.com/spreadsheets/d/1OM077lZBARrtUi6g0X0O0PHaIbFKCD6v0djRefQRE1I/edit#gid=995660187] -- This message was sent by Atlassian JIRA (v6.3.4#6332)