drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: select from table with options
Date Fri, 06 Nov 2015 01:00:38 GMT
TL;DR: TableMacro works for me; I need help with a bug in Calcite when
there's more than 1 function with the same name.

FYI: I have a prototype of TableMacro working in Drill. For now just being
able to specify the delimiter for csv files.
So it seem the answer to my question 1) is that TableMacros are the way to
go.
I'm still wondering about *3) is the table(...) wrapping syntax necessary?*

I had to fix some things in Calcite to enable this:
https://github.com/dremio/calcite/pull/1/files
Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite
for the Maze example.
Which is why some hooks were missing.

I think I found a bug in Calcite but I'd need help to fix it.
Here is a test that reproduces the problem:
https://github.com/apache/calcite/pull/166
If we return more than 1 TableFunction with the same name, we get a NPE
later on.


On Wed, Nov 4, 2015 at 9:49 AM, Julien Le Dem <julien@dremio.com> wrote:

> Looking in more details, The DrillTable already has toRel implemented, I
> just need to add "implements TranslatableTable"
> I'll try to implement TableMacro and see what happens.
>
> On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <julien@dremio.com> wrote:
>
>> FYI: here are the change to Calcite I did on the Drill fork:
>> https://github.com/mapr/incubator-calcite/pull/4/files
>> I'll port to the calcite master.
>>
>> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <julien@dremio.com> wrote:
>>
>>> Thanks Julian,
>>> I have looked into using Table Functions in Drill. I had to make some
>>> modifications to the planner so that the function lookup in the Storage
>>> plugin works. I will submit a patch for that.
>>>
>>> I had a few questions:
>>>  *1)* For this particular use case it seems that we could use
>>> TableMacro as all the logic can be happening in the planner. Should I look
>>> into that?
>>>    - Drill Schema returns a DrillTable (which implements Table).
>>>    - A TableMacro returns a TranslatableTable
>>>    - It is not clear to me what a TableFunction returns as it defines
>>> only methods that return types.
>>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>>> only difference with getTable is that we use the function parameters when
>>> producing a table.
>>> For reference: Drill getTable there:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>>> It indirectly calls:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>>
>>>  *2)* The getFunctions method in Schema does not seem to be aware at
>>> all of the context it is called in. I would want to return different
>>> functions depending on where we are in the query (table functions in the
>>> from clause, regular functions in where). Is there a way to know if we are
>>> in the context of a FROM or a WHERE clause?
>>>
>>>  *3)* is the table(...) wrapping syntax necessary?
>>> Note:
>>>   - In Drill back ticks are use for identifiers containing dot or slash.
>>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>>   - single quotes are used to delimit strings: 'my string passed as a
>>> parameter'
>>>
>>>   The current syntax is something like:
>>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>>> It seems that table(...) is redundant since we are in the from clause.
>>>  It could simply be:
>>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>>
>>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>>> parameter? (not the backticks instead of single quotes)
>>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>>> delimiter => '|')*
>>>
>>> Thank you!
>>> Julien
>>>
>>>
>>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jhyde@apache.org> wrote:
>>>
>>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <jacques@dremio.com>
>>>> wrote:
>>>> > Agreed. We need both select with option and .drill (by etl process or
>>>> by
>>>> > sql ascribe metadata).
>>>> >
>>>> > Let's start with the select with options. My only goal would be to
>>>> make
>>>> > sure that creation of .drill file through SQL uses a similar pattern
>>>> to the
>>>> > select with options. It is also important that tables names are still
>>>> > expressed as identifiers instead of strings (people already have
>>>> enough
>>>> > trouble with remembering whether to use single quotes or backticks).
>>>> If the
>>>> > table function approach is everybody's preferred approach, I think it
>>>> is
>>>> > important to have named parameters per Julian's notes.
>>>> >
>>>> > @Julian, how hard do you think it will be to add named parameters?
>>>>
>>>> I just checked in a fix for
>>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>>
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message