beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Pilloud <apill...@google.com>
Subject Re: [SQL] Reconciling Beam SQL Environments with Calcite Schema
Date Fri, 04 May 2018 22:38:06 GMT
Reviews are wrapping up, this will probably merge Monday if I don't hear
from anyone else. One more TableProvider API change after review feedback:
getTables now returns Map<String, Table> instead of Set<Table>.

Andrew

On Thu, May 3, 2018 at 10:41 AM Andrew Pilloud <apilloud@google.com> wrote:

> Ok, I've finished with this change. Didn't get reviews on the early
> cleanup PRs, so I've pushed all these changes into the first cleanup PR:
> https://github.com/apache/beam/pull/5224
>
> Andrew
>
> On Tue, May 1, 2018 at 10:35 AM Andrew Pilloud <apilloud@google.com>
> wrote:
>
>> I'm just starting to move forward on this. Looking at my team's short
>> term needs for SQL, option one would be good enough, however I agree with
>> Kenn that we want something like option two eventually. I also don't want
>> to break existing users and it sounds like there is at least one custom
>> MetaStore not in beam. So my plan is to go with option two and simplify the
>> interface where functionality loss will not result.
>>
>> There is a common set of operations between the MetaStore and the
>> TableProvider. I'd like to make MetaStore inherit the interface of
>> TableProvider. Most operations we need (createTable, dropTable, listTables)
>> are already identical between the two, and so this will have no impact on
>> custom implementations. The buildBeamSqlTable operation does differ: the
>> MetaStore takes a table name, the TableProvider takes a table object.
>> However everything calling this API already has the full table object, so I
>> would like to simplify this interface by passing the table object in both
>> cases. Objections?
>>
>> Andrew
>>
>> On Tue, Apr 24, 2018 at 9:27 AM James <xumingmingv@gmail.com> wrote:
>>
>>> Kenn: yes, MetaStore is user-facing, Users can choose to implement their
>>> own MetaStore, currently only an InMemory implementation in Beam CodeBase.
>>>
>>> Andrew: I like the second option, since it "retains the ability for DDL
>>> operations to be processed by a custom MetaStore.", IMO we should have the
>>> DDL ability as a fully functional SQL.
>>>
>>> On Tue, Apr 24, 2018 at 10:28 PM Kenneth Knowles <klk@google.com> wrote:
>>>
>>>> Can you say more about how the metastore is used? I presume it is or
>>>> will be user-facing, so are Beam SQL users already providing their own?
>>>>
>>>> I'm sure we want something like that eventually to support things like
>>>> Apache Atlas and HCatalog, IIUC for the "create if needed" logic when using
>>>> Beam SQL to create a derived data set. But I don't think we should build
>>>> out those code paths until we have at least one non-in-memory
>>>> implementation.
>>>>
>>>> Just a really high level $0.02.
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Apr 23, 2018 at 4:56 PM Andrew Pilloud <apilloud@google.com>
>>>> wrote:
>>>>
>>>>> I'm working on updating our Beam DDL code to use the DDL execution
>>>>> functionality that recently merged into core calcite. This enables us
to
>>>>> take advantage of Calcite JDBC as a way to use Beam SQL. As part of that
I
>>>>> need to reconcile the Beam SQL Environments with the Calcite Schema (which
>>>>> is calcite's environment). We currently have copies of our tables in
the
>>>>> Beam meta/store, Calcite Schema, BeamSqlEnv, and BeamQueryPlanner. I
have a
>>>>> pending PR which merges the later two to just use the Calcite Schema
copy.
>>>>> Merging the Beam MetaStore and Calcite Schema isn't as simple. I have
>>>>> two options I'm looking for feedback on:
>>>>>
>>>>> 1. Make Calcite Schema authoritative and demote MetaStore to be
>>>>> something more like a Calcite TableFactory. Calcite Schema already
>>>>> implements the semantics of our InMemoryMetaStore. If the Store interface
>>>>> is just over built, this approach would result in a significant reduction
>>>>> in code. This would however eliminate the CRUD part of the interface
>>>>> leaving just the buildBeamSqlTable function.
>>>>>
>>>>> 2. Pass the Beam MetaStore into Calcite wrapped with a class
>>>>> translating to Calcite Schema (like we do already with tables). Instead
of
>>>>> copying tables into the Calcite Schema we would pass in Beam meta/store
as
>>>>> the source of truth and Calcite would manipulate tables directly in the
>>>>> Beam meta/store. This is a bit more complicated but retains the ability
for
>>>>> DDL operations to be processed by a custom MetaStore.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Andrew
>>>>>
>>>>

Mime
View raw message