Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 527FF200C4E for ; Fri, 21 Apr 2017 21:36:41 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 51277160B97; Fri, 21 Apr 2017 19:36:41 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 98C81160B86 for ; Fri, 21 Apr 2017 21:36:40 +0200 (CEST) Received: (qmail 63217 invoked by uid 500); 21 Apr 2017 19:36:39 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 63205 invoked by uid 99); 21 Apr 2017 19:36:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Apr 2017 19:36:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 088541A7A81 for ; Fri, 21 Apr 2017 19:36:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.102 X-Spam-Level: X-Spam-Status: No, score=0.102 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=google.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id YtlH04YfSKgC for ; Fri, 21 Apr 2017 19:36:37 +0000 (UTC) Received: from mail-oi0-f47.google.com (mail-oi0-f47.google.com [209.85.218.47]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E8D905F342 for ; Fri, 21 Apr 2017 19:36:36 +0000 (UTC) Received: by mail-oi0-f47.google.com with SMTP id s131so22946446oia.3 for ; Fri, 21 Apr 2017 12:36:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=uYFLm3YJqu62kFQdXocSjik6ONgpuWllO20HA9Com54=; b=RYELFdCs+UB4gx3VBnzJak9S1+auP+Tl+y04Q/ah1N5qcwTcaGe1kbXcSdSH+t2BUF Kw5zgktYvZNig0SKi83nvpMzIvl7zbHTL+q+6lKkvghFot8kTTOAk7FNhcRVWIIortV8 raHNMXbgHDa6Fg/WD+2F4eUYnHF3CPX1sVKuztx2kI0FhpQvoszJUiHwOxkNyBWj9PWI 6IY6JjGCgEiyLY3DpJ8sOITD4pcMpLkv+HS3Azuo06d55yIwxs0PBCs9zMRDf57W8Wee PkwIT89yYDYCVsXmLTUo6L77e7GcH5LsXisrucMijotnX3Qf6EUoRfy//tN16r/piANk xBvw== X-Gm-Message-State: AN3rC/7MBXQdo7XIpUeHo5m6z4PC2avn7QOvlh8ana+C6ZOMijn/BPGz uIJWuWOAE2Tr13CblpaZ1cGPIzn6kWHeOEgtqg== X-Received: by 10.98.202.80 with SMTP id n77mr14146882pfg.158.1492803396212; Fri, 21 Apr 2017 12:36:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.176.173 with HTTP; Fri, 21 Apr 2017 12:36:35 -0700 (PDT) In-Reply-To: References: From: Lukasz Cwik Date: Fri, 21 Apr 2017 12:36:35 -0700 Message-ID: Subject: Re: Towards a spec for robust streaming SQL, Part 1 To: dev@beam.apache.org Cc: "dev@flink.apache.org" , "dev@calcite.apache.org" Content-Type: multipart/alternative; boundary=94eb2c0db60427076c054db26039 archived-at: Fri, 21 Apr 2017 19:36:41 -0000 --94eb2c0db60427076c054db26039 Content-Type: text/plain; charset=UTF-8 The doc is a good read. I think you do a great job of explaining table -> stream, stream -> stream, and stream -> table when there is only one stream. But when there are multiple streams reading/writing to a table, how does that impact what occurs? For example, with CoGBK you have multiple streams writing to a table, how does that impact window merging? On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau wrote: > Hello Beam, Calcite, and Flink dev lists! > > Apologies for the big cross post, but I thought this might be something all > three communities would find relevant. > > Beam is finally making progress on a SQL DSL utilizing Calcite, thanks to > Mingmin Xu. As you can imagine, we need to come to some conclusion about > how to elegantly support the full suite of streaming functionality in the > Beam model in via Calcite SQL. You folks in the Flink community have been > pushing on this (e.g., adding windowing constructs, amongst others, thank > you! :-), but from my understanding we still don't have a full spec for how > to support robust streaming in SQL (including but not limited to, e.g., a > triggers analogue such as EMIT). > > I've been spending a lot of time thinking about this and have some opinions > about how I think it should look that I've already written down, so I > volunteered to try to drive forward agreement on a general streaming SQL > spec between our three communities (well, technically I volunteered to do > that w/ Beam and Calcite, but I figured you Flink folks might want to join > in since you're going that direction already anyway and will have useful > insights :-). > > My plan was to do this by sharing two docs: > > 1. The Beam Model : Streams & Tables - This one is for context, and > really only mentions SQL in passing. But it describes the relationship > between the Beam Model and the "streams & tables" way of thinking, which > turns out to be useful in understanding what robust streaming in SQL > might > look like. Many of you probably already know some or all of what's in > here, > but I felt it was necessary to have it all written down in order to > justify > some of the proposals I wanted to make in the second doc. > > 2. A streaming SQL spec for Calcite - The goal for this doc is that it > would become a general specification for what robust streaming SQL in > Calcite should look like. It would start out as a basic proposal of what > things *could* look like (combining both what things look like now as > well > as a set of proposed changes for the future), and we could all iterate > on > it together until we get to something we're happy with. > > At this point, I have doc #1 ready, and it's a bit of a monster, so I > figured I'd share it and let folks hack at it with comments if they have > any, while I try to get the second doc ready in the meantime. As part of > getting doc #2 ready, I'll be starting a separate thread to try to gather > input on what things are already in flight for streaming SQL across the > various communities, to make sure the proposal captures everything that's > going on as accurately as it can. > > If you have any questions or comments, I'm interested to hear them. > Otherwise, here's doc #1, "The Beam Model : Streams & Tables": > > http://s.apache.org/beam-streams-tables > > -Tyler > --94eb2c0db60427076c054db26039--