beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Pilloud <apill...@google.com>
Subject Re: [SQL] Reconciling Beam SQL Environments with Calcite Schema
Date Thu, 03 May 2018 17:41:37 GMT
Ok, I've finished with this change. Didn't get reviews on the early cleanup
PRs, so I've pushed all these changes into the first cleanup PR:
https://github.com/apache/beam/pull/5224

Andrew

On Tue, May 1, 2018 at 10:35 AM Andrew Pilloud <apilloud@google.com> wrote:

> I'm just starting to move forward on this. Looking at my team's short term
> needs for SQL, option one would be good enough, however I agree with Kenn
> that we want something like option two eventually. I also don't want to
> break existing users and it sounds like there is at least one custom
> MetaStore not in beam. So my plan is to go with option two and simplify the
> interface where functionality loss will not result.
>
> There is a common set of operations between the MetaStore and the
> TableProvider. I'd like to make MetaStore inherit the interface of
> TableProvider. Most operations we need (createTable, dropTable, listTables)
> are already identical between the two, and so this will have no impact on
> custom implementations. The buildBeamSqlTable operation does differ: the
> MetaStore takes a table name, the TableProvider takes a table object.
> However everything calling this API already has the full table object, so I
> would like to simplify this interface by passing the table object in both
> cases. Objections?
>
> Andrew
>
> On Tue, Apr 24, 2018 at 9:27 AM James <xumingmingv@gmail.com> wrote:
>
>> Kenn: yes, MetaStore is user-facing, Users can choose to implement their
>> own MetaStore, currently only an InMemory implementation in Beam CodeBase.
>>
>> Andrew: I like the second option, since it "retains the ability for DDL
>> operations to be processed by a custom MetaStore.", IMO we should have the
>> DDL ability as a fully functional SQL.
>>
>> On Tue, Apr 24, 2018 at 10:28 PM Kenneth Knowles <klk@google.com> wrote:
>>
>>> Can you say more about how the metastore is used? I presume it is or
>>> will be user-facing, so are Beam SQL users already providing their own?
>>>
>>> I'm sure we want something like that eventually to support things like
>>> Apache Atlas and HCatalog, IIUC for the "create if needed" logic when using
>>> Beam SQL to create a derived data set. But I don't think we should build
>>> out those code paths until we have at least one non-in-memory
>>> implementation.
>>>
>>> Just a really high level $0.02.
>>>
>>> Kenn
>>>
>>> On Mon, Apr 23, 2018 at 4:56 PM Andrew Pilloud <apilloud@google.com>
>>> wrote:
>>>
>>>> I'm working on updating our Beam DDL code to use the DDL execution
>>>> functionality that recently merged into core calcite. This enables us to
>>>> take advantage of Calcite JDBC as a way to use Beam SQL. As part of that
I
>>>> need to reconcile the Beam SQL Environments with the Calcite Schema (which
>>>> is calcite's environment). We currently have copies of our tables in the
>>>> Beam meta/store, Calcite Schema, BeamSqlEnv, and BeamQueryPlanner. I have
a
>>>> pending PR which merges the later two to just use the Calcite Schema copy.
>>>> Merging the Beam MetaStore and Calcite Schema isn't as simple. I have
>>>> two options I'm looking for feedback on:
>>>>
>>>> 1. Make Calcite Schema authoritative and demote MetaStore to be
>>>> something more like a Calcite TableFactory. Calcite Schema already
>>>> implements the semantics of our InMemoryMetaStore. If the Store interface
>>>> is just over built, this approach would result in a significant reduction
>>>> in code. This would however eliminate the CRUD part of the interface
>>>> leaving just the buildBeamSqlTable function.
>>>>
>>>> 2. Pass the Beam MetaStore into Calcite wrapped with a class
>>>> translating to Calcite Schema (like we do already with tables). Instead of
>>>> copying tables into the Calcite Schema we would pass in Beam meta/store as
>>>> the source of truth and Calcite would manipulate tables directly in the
>>>> Beam meta/store. This is a bit more complicated but retains the ability for
>>>> DDL operations to be processed by a custom MetaStore.
>>>>
>>>> Thoughts?
>>>>
>>>> Andrew
>>>>
>>>

Mime
View raw message