drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: select from table with options
Date Fri, 13 Nov 2015 04:34:33 GMT
You’re hitting the grammar ambiguity I expected.

I think that base Calcite should require the full verbose syntax: the TABLE keyword for table
functions and the EXTEND keyword for extends clauses. Then Drill can override to make TABLE
optional, and Phoenix can override to make EXTEND optional.

Are you changing the parser in your forked copy of Calcite, or are you changing Drill’s
extensions to that parser?

If the former, you (or I) should add extension points to Calcite’s parser make the TABLE
keyword optional and to make the EXTEND keyword optional. No project should enable both extension
points — otherwise they’ll end up with an ambiguous grammar. If you agree create a Calcite
JIRA case for this.

Julian

 
> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <julien@dremio.com> wrote:
> 
> Hi,
> I've been trying to enable this but it looks like in the current grammar
> (before my change) you can not use table functions and EXTEND together.
> That's because they are on difference branches of an | in the grammar.
> So I would suggest that we treat those as two separate improvement in two
> different pull requests:
> - not require table(...) to call table functions
> - allow using table functions and extend together.
> Does it make sense?
> Julien
> 
> 
> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jhyde@apache.org> wrote:
> 
>> To be clear, it should be possible to use a table function with all of
>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>> aliases, TABLESAMPLE.
>> 
>> I'm surprised that the parser didn't need more lookahead to choose
>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>> 
>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <julien@dremio.com> wrote:
>>> In the patch I just sent, probably not.
>>> I will adjust it and add the corresponding test.
>>> 
>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jhyde@apache.org> wrote:
>>> 
>>>> Can you use both together? Say
>>>> 
>>>>  select columns
>>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>> EXTEND
>>>> (foo INTEGER)
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <julien@dremio.com>
>> wrote:
>>>>> 
>>>>> I took a stab at adding the TableFunction syntax without table(...) in
>>>>> Calcite.
>>>>> I have verified that both the table function and extend (with or
>> without
>>>>> keyword) work
>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>> 
>>>>> These work:
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>> fieldDelimiter
>>>> =>
>>>>> '|')
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>> fieldDelimiter => '|'))
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>> 
>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <jacques@apache.org>
>>>> wrote:
>>>>> 
>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>> we
>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>> 
>>>>>> 
>>>>>> That being said, I think we could make these work together... maybe.
>>>>>> 
>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>> 
>>>>>> Create a new sub block in the table block that requires no keyword.
>>>> There
>>>>>> would be two paths (and would probably require some lookahead)
>>>>>> 
>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>> golfHandicap INTEGER)
>>>>>> 
>>>>>> Then we create a table function with options 1 & 2, an EXTENDS
clause
>>>> for
>>>>>> option 3.
>>>>>> 
>>>>>> Best of both worlds?
>>>>>> 
>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Phoenix already supports columns at read-time using the syntax
>> without
>>>>>> the
>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>  WHERE goldHandicap < 10;
>>>>>>> 
>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>> backward
>>>>>>> compatibility problem.
>>>>>>> 
>>>>>>> I think it'd be good if both of these extensions worked in Drill
&
>>>>>> Phoenix
>>>>>>> given our Drillix initiative.
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> My proposal was an a or b using the freemarker template in
the
>>>> grammar,
>>>>>>>> not something later.
>>>>>>>> 
>>>>>>>> Actually, put another way: we may want to consider stating
that we
>>>> only
>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>> should
>>>>>>> be
>>>>>>>> optional grammar. We could simply have grammar plugins in
Calcite
>> (the
>>>>>>> same
>>>>>>>> way we plug in external things in Drill).
>>>>>>>> 
>>>>>>>> Trying to get every project to agree on extensions seems
like it
>> may
>>>> be
>>>>>>>> hard.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jacques Nadeau
>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jhyde@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>> 
>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars
need to be
>>>>>>>>> predictable. Any variation should happen at validation
time, or
>>>> later.
>>>>>>>>> 
>>>>>>>>> Also, we shouldn’t add configuration parameters as
a way of
>> avoiding
>>>> a
>>>>>>>>> tough design discussion.
>>>>>>>>> 
>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard
SQL, and
>>>>>> they
>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill
and
>> Phoenix
>>>>>> (by
>>>>>>>>> which I mean Jacques and James, I guess) need to agree
what the
>> SQL
>>>>>>> syntax
>>>>>>>>> should be.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <jscott@maprtech.com>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Looking at those two examples I agree with Jacques.
The first
>>>>>> appears
>>>>>>>>> more
>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>> jacques@dremio.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Since EXTEND is custom functionality, it seems
reasonable that
>> we
>>>>>>> could
>>>>>>>>>>> have a switch. Given that SQL Server and Postgres
support it
>> seems
>>>>>>>>>>> reasonable to support the table functions without
the TABLE
>> syntax.
>>>>>>>>>>> 
>>>>>>>>>>> I for one definitely think the TABLE syntax is
much more
>> confusing
>>>>>> to
>>>>>>>>> use,
>>>>>>>>>>> especially in the example that we're looking
to support, such
>> as:
>>>>>>>>>>> 
>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type =>
'CSV',
>>>>>> fieldDelimiter
>>>>>>> =>
>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>> 
>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>> 
>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type
=> 'CSV',
>>>>>>>>> fieldDelimiter
>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>> 
>>>>>>>>>>> It also looks much more like a hint to the table
(which is our
>>>>>> goal).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jhyde@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for doing the legwork and finding
what the other vendors
>>>>>> do.
>>>>>>>>> It is
>>>>>>>>>>>> indeed compelling that SQL Server and Postgres
go beyond the
>>>>>>> standard
>>>>>>>>> an
>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>> 
>>>>>>>>>>>> I tried that syntax in Calcite and discovered
that there is a
>>>>>> clash
>>>>>>>>> with
>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493
we added the
>>>>>>>>> EXTENDS
>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100),
golfHandicap
>> INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> to tell Calcite that there are two undeclared
columns in the
>> Emp
>>>>>>> table
>>>>>>>>>>> but
>>>>>>>>>>>> you would like to use them in this particular
query. We chose
>> to
>>>>>>> make
>>>>>>>>> the
>>>>>>>>>>>> EXTEND keyword optional, so you could instead
write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap
INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>> 
>>>>>>>>>>>> so we would require
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>> 
>>>>>>>>>>>> if EmpFunction was a table-function. You
could combine the two
>>>>>> forms
>>>>>>>>> like
>>>>>>>>>>>> this:
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap))
EXTEND
>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>> 
>>>>>>>>>>>> We could revisit whether EXTEND is optional,
I suppose. But we
>>>>>>> should
>>>>>>>>>>> also
>>>>>>>>>>>> ask whether requiring folks to type TABLE
is such a hardship.
>>>>>>>>>>>> 
>>>>>>>>>>>> Julian
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le
Dem <julien@dremio.com>
>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Table function syntax: I did a quick
search and it seems
>>>>>> there's
>>>>>>> no
>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server
[2] both allow
>> calling
>>>>>>> table
>>>>>>>>>>>>> functions without the table(...) wrapper
while Oracle [3] and
>> DB2
>>>>>>> [4]
>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's
a draw :)
>>>>>>>>>>>>> Would it be reasonable to allow a switch
in the grammar
>>>>>> generation
>>>>>>> to
>>>>>>>>>>>> have
>>>>>>>>>>>>> a posgres compatible syntax? Currently
in Drill we use the
>> MySQL
>>>>>>> like
>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>> [3]
>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>> [4]
>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>> [5]
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding
fixes the
>> function
>>>>>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>> But that seems too easy to be true. Possibly
this method is
>>>>>> called
>>>>>>>>> more
>>>>>>>>>>>>> than once (before and after the function
has been resolved?)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> FYI this would happen only when using
named parameter. We do
>> want
>>>>>>> to
>>>>>>>>>>>>> overload in this case, which is why I'm
looking into it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Julien
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian
Hyde <jhyde@apache.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien
Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I
need help with a bug in
>>>>>> Calcite
>>>>>>>>> when
>>>>>>>>>>>>>> there's more than 1 function with
the same name.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro
working in Drill. For
>> now
>>>>>>> just
>>>>>>>>>>>> being
>>>>>>>>>>>>>> able to specify the delimiter for
csv files.
>>>>>>>>>>>>>> So it seem the answer to my question
1) is that TableMacros
>> are
>>>>>>> the
>>>>>>>>>>> way
>>>>>>>>>>>> to
>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>> I'm still wondering about *3) is
the table(...) wrapping
>> syntax
>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2
and #3 look awfully
>> similar.
>>>>>>> Also,
>>>>>>>>>>> if
>>>>>>>>>>>> f
>>>>>>>>>>>>>> is a function with zero arguments,
could you invoke it like
>>>>>> this?:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don’t know the actual rationale.
But I know that the SQL
>>>>>>> standards
>>>>>>>>>>>>>> people in their wisdom decided to
add a keyword to
>> disambiguate.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I had to fix some things in Calcite
to enable this:
>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner()
that does not seem to be
>> used
>>>>>>> in
>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you log a jira case to track
this bug?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think I found a bug in Calcite
but I'd need help to fix it.
>>>>>>>>>>>>>> Here is a test that reproduces the
problem:
>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>> If we return more than 1 TableFunction
with the same name, we
>>>>>> get
>>>>>>> a
>>>>>>>>>>> NPE
>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, I knew there was a problem with
overloading. Please log
>> a
>>>>>>> JIRA
>>>>>>>>>>> case
>>>>>>>>>>>>>> on resolution of overloaded functions
when invoked with named
>>>>>>>>>>> arguments.
>>>>>>>>>>>>>> (It probably applies to all functions,
not just table
>>>>>> functions.)
>>>>>>>>> The
>>>>>>>>>>>> fix
>>>>>>>>>>>>>> will take a while (if you wait for
me to write it).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For now please tell your users not
to overload. :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Julien
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> *Jim Scott*
>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>> 
>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>> 
>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
> 
> 
> 
> -- 
> Julien


Mime
View raw message