From dev-return-9508-archive-asf-public=cust-asf.ponee.io@beam.apache.org Thu May 3 20:11:00 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B4475180625 for ; Thu, 3 May 2018 20:10:59 +0200 (CEST) Received: (qmail 28941 invoked by uid 500); 3 May 2018 18:10:58 -0000 Mailing-List: contact dev-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list dev@beam.apache.org Received: (qmail 28931 invoked by uid 99); 3 May 2018 18:10:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 May 2018 18:10:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 805BE180787 for ; Thu, 3 May 2018 18:10:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.888 X-Spam-Level: * X-Spam-Status: No, score=1.888 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=google.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id S7WkuuopsYTW for ; Thu, 3 May 2018 18:10:55 +0000 (UTC) Received: from mail-oi0-f48.google.com (mail-oi0-f48.google.com [209.85.218.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 30CD75FB51 for ; Thu, 3 May 2018 18:10:55 +0000 (UTC) Received: by mail-oi0-f48.google.com with SMTP id c203-v6so16927628oib.7 for ; Thu, 03 May 2018 11:10:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=SCayWjbAOB65BdkD7Hl72DHTqPsnOg5Pmi+L/aBYRB0=; b=B6TkOwbeWZR4QQaDZ9uFY7sASNw+Mc4rIPDO3v02vlyjLFLbOqcVlKoJSyMQmUmc/v 193q66NCCsNjyTseeMKnmLq0DnnV1wMBU6izvosMfioXA00kPvEcilAYyFK/naZ74JXn fF6szTKRpzzW8GAfN1zzBu7pA2fRPIIUGWMKp3ClVO/RJzRU8qFcl35zxDHjfH0aWg6B 8pSt3Ndb7IcXepeLnXD6x5OqjTCTM/N1m9ouJG0tvey1IPrSgKMv54/240r48nrsmmzK 7jA/y2xI701UCBmkB/eGF0LevabWsvjLLlQl5wEVdGuVUtDSN+3454O+yA6os47x0s0F xNFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=SCayWjbAOB65BdkD7Hl72DHTqPsnOg5Pmi+L/aBYRB0=; b=Ue8yfcFJp9TataDKxfrvXyTyJ7CKD9t9q6MiDBkbBWpJkHEo3uT43ihfLrFHwJQgu5 NNRh44mEsrsQtWijBNBManU8FjzDmvil7JnqnlnvEwz5/q5s1JBhqB67uXroz2B/X5Ov q+HyVWThNudaHkHZZOIHmjwnq4xu9acq9pqRBylOAfhgBCFJffAcGVlUrZnT3U9Uy/nL dkcDU5mljQK71m7ydLk9lvf3zfZpY+TyZH2UcQ9TjovfiTuEFR4fbiahafHMsp6ZEoDZ lTq1lpK+emY5N31o6tbwBZFlOePDC3OfHzj1Mv14TzAdE4wfkZnrL2djURngT6PBSWQm p5ew== X-Gm-Message-State: ALQs6tAUKPe2BGHQuYVoQPn8VtoYfkPERjHd2RCnWCMEeka8wUXSCbTp 3Og+TwyGkdWkg458srB/JZCj2M3DSNZ46I4snInrDys+hj4= X-Google-Smtp-Source: AB8JxZqBaze+sKmVnJpZc6ZIO8NvwNAdbQiEHbPSvJPu3STKRkjLKGJ1bIceXG9QxXXFm/tIYOv1JKOD2hBIWoHD0ww= X-Received: by 2002:aca:6908:: with SMTP id e8-v6mr15636057oic.217.1525371053477; Thu, 03 May 2018 11:10:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrew Pilloud Date: Thu, 03 May 2018 18:10:42 +0000 Message-ID: Subject: Re: Pubsub to Beam SQL To: dev@beam.apache.org Content-Type: multipart/alternative; boundary="000000000000cb9213056b511f1b" --000000000000cb9213056b511f1b Content-Type: text/plain; charset="UTF-8" This sounds awesome! Is event timestamp something that we need to specify for every source? If so, I would suggest we add this as a first class option on CREATE TABLE rather then something hidden in TBLPROPERTIES. Andrew On Wed, May 2, 2018 at 10:30 AM Anton Kedin wrote: > Hi > > I am working on adding functionality to support querying Pubsub messages > directly from Beam SQL. > > *Goal* > Provide Beam users a pure SQL solution to create the pipelines with > Pubsub as a data source, without the need to set up the pipelines in Java > before applying the query. > > *High level approach* > > - > - Build on top of PubsubIO; > - Pubsub source will be declared using CREATE TABLE DDL statement: > - Beam SQL already supports declaring sources like Kafka and Text > using CREATE TABLE DDL; > - it supports additional configuration using TBLPROPERTIES clause. > Currently it takes a text blob, where we can put a JSON configuration; > - wrapping PubsubIO into a similar source looks feasible; > - The plan is to initially support messages only with JSON payload: > - > - more payload formats can be added later; > - Messages will be fully described in the CREATE TABLE statements: > - event timestamps. Source of the timestamp is configurable. It is > required by Beam SQL to have an explicit timestamp column for windowing > support; > - messages attributes map; > - JSON payload schema; > - Event timestamps will be taken either from publish time or > user-specified message attribute (configurable); > > Thoughts, ideas, comments? > > More details are in the doc here: > https://docs.google.com/document/d/1wIXTxh-nQ3u694XbF0iEZX_7-b3yi4ad0ML2pcAxYfE > > > Thank you, > Anton > --000000000000cb9213056b511f1b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
This sounds awesome!

Is event timestamp= something that we need to specify for every source? If so, I would suggest= we add this as a first class option on CREATE TABLE rather then something = hidden in=C2=A0TBLPROPERTIES.

Andrew
On Wed, May 2, 2018 at 10:30 = AM Anton Kedin <kedin@google.com= > wrote:
Hi
I am working on adding functionality to support querying Pu= bsub messages directly from Beam SQL.

Goal
=C2=A0 Provide= Beam users a pure=C2=A0 SQL solution to create the pipelines with Pubsub a= s a data source,=C2=A0without the need to set up the pip= elines in Java before applying the query.

=
High level approach
  • Build on top of PubsubIO;
  • Pubsub source will be decl= ared using CREATE TABLE DDL statement:
    • Beam SQL already support= s declaring sources like Kafka and Text using CREATE TABLE DDL;
    • it = supports additional configuration using TBLPROPERTIES clause. Currently it = takes a text blob, where we can put a JSON configuration;
    • wrapping = PubsubIO into a similar source looks feasible;
  • The plan is to = initially support messages only with JSON payload:
    • more payload formats can be added later;
  • Messages will be ful= ly described in the CREATE TABLE statements:
    • event timestamps. = Source of the timestamp is configurable. It is required by Beam SQL to have= an explicit timestamp column for windowing support;
    • messages attri= butes map;
    • JSON payload schema;
  • Event timestamps will = be taken either from publish time or user-specified message attribute (conf= igurable);
  • Thoughts, ideas, comments?=C2=A0

    More details are in the doc here:=C2=A0= https://docs.google.com/document/d/1wIXTxh-nQ3u694XbF0iEZX_7-b3yi4ad0ML2pcA= xYfE=C2=A0

    Thank you,
    Anton
    --000000000000cb9213056b511f1b--