From dev-return-9507-archive-asf-public=cust-asf.ponee.io@beam.apache.org Thu May 3 19:42:01 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 61B26180625 for ; Thu, 3 May 2018 19:42:00 +0200 (CEST) Received: (qmail 46429 invoked by uid 500); 3 May 2018 17:41:54 -0000 Mailing-List: contact dev-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list dev@beam.apache.org Received: (qmail 46415 invoked by uid 99); 3 May 2018 17:41:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 May 2018 17:41:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 53B48C0553 for ; Thu, 3 May 2018 17:41:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.869 X-Spam-Level: * X-Spam-Status: No, score=1.869 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=google.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id nTxovqc_ejLq for ; Thu, 3 May 2018 17:41:51 +0000 (UTC) Received: from mail-oi0-f45.google.com (mail-oi0-f45.google.com [209.85.218.45]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 319215FB55 for ; Thu, 3 May 2018 17:41:50 +0000 (UTC) Received: by mail-oi0-f45.google.com with SMTP id l1-v6so16832347oii.1 for ; Thu, 03 May 2018 10:41:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=ABLCwVvnkVKy8IL1C/ZxHz3oFL1P9BO/xIiyuskjJHE=; b=WP2hrxdHXb2uYOb8qscrhLcszsJyZiK+VNv0faWIh82SiHmPq+uYLDAw+wJgNsSGr5 osH8um9PMZDNL7TcT1p+dGynyuYbyWR/xu7oLUEeruwuiI2TPpfFguselILu/aef1HQM zj9Ib5N4vZnsF3Zf0dSfDt/37mTh08K1w6P116gyVYuYqh+MQ8zIjIY7iSiP26910XJy 0qx/P39p6mO/07VB0pUfkowBn+Uumtfm1iuRSw3l116Y/5SfY/lt76B8JL+me5RYsZt5 NKfDfIkAWX2QUpT01152PmmArBrj+lstvQfLH34A9kyln938STmhOMNb6XK0iSHvm8WW 1Ojw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=ABLCwVvnkVKy8IL1C/ZxHz3oFL1P9BO/xIiyuskjJHE=; b=e3Y7K7nLnqJ1cRN9r9wPbNcAaIcYpEdjgtE0JQocA74lJrD3dCoSlGBL+/Hrpqplwp mZnrgMteHt6gQUlHBNAfkjd1HBboSBIPV6y9YrDxbzas3tw1iBXsCuPgBaHjRvQRMWHH IZSWgDREeo5exwW/p76hyDM6Me/3eb0LYeHdXUYa1pAdzwsbQtGKPkFc2b3LTUQX6HEp uatOX+0ZwWFcOMpxRBC0KMUOUf/fb0EP53Eynw4a5AcVHLFRRuMQmVlJSqq0B5nYDks4 ipm5M6Hn6TV3Tzf07uBBgeB6SSvfRLK5jFqpuj6YOxsZZQnzndOenNGA1ZYS7JjLmAO4 GeXg== X-Gm-Message-State: ALQs6tDLfi8DM0x3UqIGn89pAZLmajX1BnbP3wyo2AUmrUK/wJQ7JyQp uCOJrjHdxNafgV8SnTjDXBxV6tk+fl4CDhPndSDp+bLMfjI= X-Google-Smtp-Source: AB8JxZq3iuviXCJNcMMWNQlvbHahP0uH0zkjP+6Z6BVdr9iluM10dIzZULyt5iXshODV6STgUY/AptdiF9vbM6wCLCI= X-Received: by 2002:aca:5a09:: with SMTP id o9-v6mr15653046oib.127.1525369308481; Thu, 03 May 2018 10:41:48 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrew Pilloud Date: Thu, 03 May 2018 17:41:37 +0000 Message-ID: Subject: Re: [SQL] Reconciling Beam SQL Environments with Calcite Schema To: dev@beam.apache.org Content-Type: multipart/alternative; boundary="000000000000c93f16056b50b79e" --000000000000c93f16056b50b79e Content-Type: text/plain; charset="UTF-8" Ok, I've finished with this change. Didn't get reviews on the early cleanup PRs, so I've pushed all these changes into the first cleanup PR: https://github.com/apache/beam/pull/5224 Andrew On Tue, May 1, 2018 at 10:35 AM Andrew Pilloud wrote: > I'm just starting to move forward on this. Looking at my team's short term > needs for SQL, option one would be good enough, however I agree with Kenn > that we want something like option two eventually. I also don't want to > break existing users and it sounds like there is at least one custom > MetaStore not in beam. So my plan is to go with option two and simplify the > interface where functionality loss will not result. > > There is a common set of operations between the MetaStore and the > TableProvider. I'd like to make MetaStore inherit the interface of > TableProvider. Most operations we need (createTable, dropTable, listTables) > are already identical between the two, and so this will have no impact on > custom implementations. The buildBeamSqlTable operation does differ: the > MetaStore takes a table name, the TableProvider takes a table object. > However everything calling this API already has the full table object, so I > would like to simplify this interface by passing the table object in both > cases. Objections? > > Andrew > > On Tue, Apr 24, 2018 at 9:27 AM James wrote: > >> Kenn: yes, MetaStore is user-facing, Users can choose to implement their >> own MetaStore, currently only an InMemory implementation in Beam CodeBase. >> >> Andrew: I like the second option, since it "retains the ability for DDL >> operations to be processed by a custom MetaStore.", IMO we should have the >> DDL ability as a fully functional SQL. >> >> On Tue, Apr 24, 2018 at 10:28 PM Kenneth Knowles wrote: >> >>> Can you say more about how the metastore is used? I presume it is or >>> will be user-facing, so are Beam SQL users already providing their own? >>> >>> I'm sure we want something like that eventually to support things like >>> Apache Atlas and HCatalog, IIUC for the "create if needed" logic when using >>> Beam SQL to create a derived data set. But I don't think we should build >>> out those code paths until we have at least one non-in-memory >>> implementation. >>> >>> Just a really high level $0.02. >>> >>> Kenn >>> >>> On Mon, Apr 23, 2018 at 4:56 PM Andrew Pilloud >>> wrote: >>> >>>> I'm working on updating our Beam DDL code to use the DDL execution >>>> functionality that recently merged into core calcite. This enables us to >>>> take advantage of Calcite JDBC as a way to use Beam SQL. As part of that I >>>> need to reconcile the Beam SQL Environments with the Calcite Schema (which >>>> is calcite's environment). We currently have copies of our tables in the >>>> Beam meta/store, Calcite Schema, BeamSqlEnv, and BeamQueryPlanner. I have a >>>> pending PR which merges the later two to just use the Calcite Schema copy. >>>> Merging the Beam MetaStore and Calcite Schema isn't as simple. I have >>>> two options I'm looking for feedback on: >>>> >>>> 1. Make Calcite Schema authoritative and demote MetaStore to be >>>> something more like a Calcite TableFactory. Calcite Schema already >>>> implements the semantics of our InMemoryMetaStore. If the Store interface >>>> is just over built, this approach would result in a significant reduction >>>> in code. This would however eliminate the CRUD part of the interface >>>> leaving just the buildBeamSqlTable function. >>>> >>>> 2. Pass the Beam MetaStore into Calcite wrapped with a class >>>> translating to Calcite Schema (like we do already with tables). Instead of >>>> copying tables into the Calcite Schema we would pass in Beam meta/store as >>>> the source of truth and Calcite would manipulate tables directly in the >>>> Beam meta/store. This is a bit more complicated but retains the ability for >>>> DDL operations to be processed by a custom MetaStore. >>>> >>>> Thoughts? >>>> >>>> Andrew >>>> >>> --000000000000c93f16056b50b79e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Ok, I've finished with this change. Didn't get rev= iews on the early cleanup PRs, so I've pushed all these changes into th= e first cleanup PR: ht= tps://github.com/apache/beam/pull/5224

Andrew
<= /div>
On Tue, May 1, 2018 at= 10:35 AM Andrew Pilloud <apillou= d@google.com> wrote:
I'm just starting to move forward on this. Looking at my team= 's short term needs for SQL, option one would be good enough, however I= agree with Kenn that we want something like option two eventually. I also = don't want to break existing users and it sounds like there is at least= one custom MetaStore not in beam. So my plan is to go with option two and = simplify the interface where functionality loss will not result.

There is a common set of operations between the MetaStore and the = TableProvider. I'd like to make MetaStore inherit the interface of Tabl= eProvider. Most operations we need (createTable,=C2=A0dropTable,=C2=A0listT= ables) are already identical between the two, and so this will have no impa= ct on custom implementations. The buildBeamSqlTable operation does differ: = the MetaStore takes a table name, the TableProvider takes a table object. H= owever everything calling this API already has the full table object, so I = would like to simplify this interface by passing the table object in both c= ases. Objections?

Andrew

On Tue, Apr 24, 2018 at 9:27 AM James <= ;xumingmingv@gma= il.com> wrote:
Kenn: yes, MetaStore is user-facing, Users can choose to implement thei= r own MetaStore, currently only an InMemory implementation in Beam CodeBase= .

Andrew: I like the= second option, since it "retains the ability for DDL operations to be= processed by a custom MetaStore.", IMO we should have the DDL ability= as a fully functional SQL.

On Tue, Apr 24, 2018 at 10:28 PM Kenneth Knowles <= ;klk@google.com>= wrote:
Can you sa= y more about how the metastore is used? I presume it is or will be user-fac= ing, so are Beam SQL users already providing their own?

=
I'm sure we want something like that eventually to support things = like Apache Atlas and HCatalog, IIUC for the "create if needed" l= ogic when using Beam SQL to create a derived data set. But I don't thin= k we should build out those code paths until we have at least one non-in-me= mory implementation.

Just a really high level $0.02.

Kenn
<= /div>

On Mon, Apr 23, 2018 at 4:56 PM Andrew Pilloud <apilloud@google.com> = wrote:
I'm wor= king on updating our Beam DDL code to use the DDL execution functionality t= hat recently merged into core calcite. This enables us to take advantage of= Calcite JDBC as a way to use Beam SQL. As part of that I need to reconcile= the Beam SQL Environments with the Calcite Schema (which is calcite's = environment). We currently have copies of our tables in the Beam meta/store= , Calcite Schema, BeamSqlEnv, and BeamQueryPlanner. I have a pending PR whi= ch merges the later two to just use the Calcite Schema copy. Merging the Be= am MetaStore and Calcite Schema isn't as simp= le. I have two options I'm looking for feedback on:

= 1. Make Calcite Schema authoritative and demote MetaStor= e to be something more like a Calcite TableFactory. Calcite Schema a= lready implements the semantics of our=C2=A0InMemoryMetaStore. If the Store= interface is just over built, this approach would result in a significant = reduction in code. This would however eliminate the CRUD part of the interf= ace leaving just the buildBeamSqlTable function.

2= . Pass the Beam MetaStore into Calcite wrapped with a class translating to = Calcite Schema (like we do already with tables). Instead of copying tables = into the Calcite Schema we would pass in Beam meta/store as the source of t= ruth and Calcite would manipulate tables directly in the Beam meta/store. T= his is a bit more complicated but retains the ability for DDL operations to= be processed by a custom MetaStore.

Thoughts?

Andrew
--000000000000c93f16056b50b79e--