drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: select from table with options
Date Tue, 10 Nov 2015 20:51:54 GMT
To be clear, it should be possible to use a table function with all of
the options -- EXTENDS clause, OVER clause, AS with alias and column
aliases, TABLESAMPLE.

I'm surprised that the parser didn't need more lookahead to choose
between 't (x, y)' and 't (x INTEGER, y DATE)'.

On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <julien@dremio.com> wrote:
> In the patch I just sent, probably not.
> I will adjust it and add the corresponding test.
>
> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jhyde@apache.org> wrote:
>
>> Can you use both together? Say
>>
>>   select columns
>>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND
>> (foo INTEGER)
>>
>> Julian
>>
>>
>>
>> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <julien@dremio.com> wrote:
>> >
>> > I took a stab at adding the TableFunction syntax without table(...) in
>> > Calcite.
>> > I have verified that both the table function and extend (with or without
>> > keyword) work
>> >
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>> >
>> > These work:
>> >
>> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter
>> =>
>> > '|')
>> >
>> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>> > fieldDelimiter => '|'))
>> >
>> > select columns from table(dfs.`/path/to/myfile`('JSON'))
>> >
>> > select columns from dfs.`/path/to/myfile`('JSON')
>> >
>> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
>> >
>> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <jacques@apache.org>
>> wrote:
>> >
>> >> Drill does implicitly what Phoenix does explicitly so I don't think we
>> >> should constrain ourselves to having a union of the two syntaxes.
>> >>
>> >>
>> >> That being said, I think we could make these work together... maybe.
>> >>
>> >> Remove the EXTENDS without keyword syntax from the grammar.
>> >>
>> >> Create a new sub block in the table block that requires no keyword.
>> There
>> >> would be two paths (and would probably require some lookahead)
>> >>
>> >> option 1> unnamed parameters (1,2,3)
>> >> option 2> named parameters (a => 1, b=>2, c=> 3)
>> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
>> >> golfHandicap INTEGER)
>> >>
>> >> Then we create a table function with options 1 & 2, an EXTENDS clause
>> for
>> >> option 3.
>> >>
>> >> Best of both worlds?
>> >>
>> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org>
>> >> wrote:
>> >>
>> >>> Phoenix already supports columns at read-time using the syntax without
>> >> the
>> >>> EXTENDS keyword as Julian indicated:
>> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>   WHERE goldHandicap < 10;
>> >>>
>> >>> Changing this by requiring the EXTENDS keyword would create a backward
>> >>> compatibility problem.
>> >>>
>> >>> I think it'd be good if both of these extensions worked in Drill &
>> >> Phoenix
>> >>> given our Drillix initiative.
>> >>>
>> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com>
>> >> wrote:
>> >>>
>> >>>> My proposal was an a or b using the freemarker template in the
>> grammar,
>> >>>> not something later.
>> >>>>
>> >>>> Actually, put another way: we may want to consider stating that
we
>> only
>> >>>> incorporate SQL standards in our primary grammar. Any extensions
>> should
>> >>> be
>> >>>> optional grammar. We could simply have grammar plugins in Calcite
(the
>> >>> same
>> >>>> way we plug in external things in Drill).
>> >>>>
>> >>>> Trying to get every project to agree on extensions seems like it
may
>> be
>> >>>> hard.
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Jacques Nadeau
>> >>>> CTO and Co-Founder, Dremio
>> >>>>
>> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jhyde@apache.org>
wrote:
>> >>>>
>> >>>>> I can see why Jacques wants this syntax.
>> >>>>>
>> >>>>> However a “switch" in a grammar is a bad idea. Grammars need
to be
>> >>>>> predictable. Any variation should happen at validation time,
or
>> later.
>> >>>>>
>> >>>>> Also, we shouldn’t add configuration parameters as a way of
avoiding
>> a
>> >>>>> tough design discussion.
>> >>>>>
>> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
and
>> >> they
>> >>>>> are both applicable to Drill and Phoenix. I think Drill and
Phoenix
>> >> (by
>> >>>>> which I mean Jacques and James, I guess) need to agree what
the SQL
>> >>> syntax
>> >>>>> should be.
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <jscott@maprtech.com>
wrote:
>> >>>>>>
>> >>>>>> Looking at those two examples I agree with Jacques. The
first
>> >> appears
>> >>>>> more
>> >>>>>> like a hint from the syntactic sugar point of view.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
>> >>>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>> Since EXTEND is custom functionality, it seems reasonable
that we
>> >>> could
>> >>>>>>> have a switch. Given that SQL Server and Postgres support
it seems
>> >>>>>>> reasonable to support the table functions without the
TABLE syntax.
>> >>>>>>>
>> >>>>>>> I for one definitely think the TABLE syntax is much
more confusing
>> >> to
>> >>>>> use,
>> >>>>>>> especially in the example that we're looking to support,
such as:
>> >>>>>>>
>> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>> >> fieldDelimiter
>> >>> =>
>> >>>>>>> '|', skipFirstRow => true)
>> >>>>>>>
>> >>>>>>> This seems much clearer than:
>> >>>>>>>
>> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type =>
'CSV',
>> >>>>> fieldDelimiter
>> >>>>>>> => '|', skipFirstRow => true))
>> >>>>>>>
>> >>>>>>> It also looks much more like a hint to the table (which
is our
>> >> goal).
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Jacques Nadeau
>> >>>>>>> CTO and Co-Founder, Dremio
>> >>>>>>>
>> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jhyde@apache.org>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> Thanks for doing the legwork and finding what the
other vendors
>> >> do.
>> >>>>> It is
>> >>>>>>>> indeed compelling that SQL Server and Postgres go
beyond the
>> >>> standard
>> >>>>> an
>> >>>>>>>> make the TABLE keyword optional.
>> >>>>>>>>
>> >>>>>>>> I tried that syntax in Calcite and discovered that
there is a
>> >> clash
>> >>>>> with
>> >>>>>>>> one of our own (few) extensions. In
>> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493
we added the
>> >>>>> EXTENDS
>> >>>>>>>> clause. You can write
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
INTEGER)
>> >>>>>>>> WHERE goldHandicap < 10;
>> >>>>>>>>
>> >>>>>>>> to tell Calcite that there are two undeclared columns
in the Emp
>> >>> table
>> >>>>>>> but
>> >>>>>>>> you would like to use them in this particular query.
We chose to
>> >>> make
>> >>>>> the
>> >>>>>>>> EXTEND keyword optional, so you could instead write
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap
INTEGER)
>> >>>>>>>> WHERE goldHandicap < 10;
>> >>>>>>>>
>> >>>>>>>> That is uncomfortably close to
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>> >>>>>>>>
>> >>>>>>>> so we would require
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>> >>>>>>>>
>> >>>>>>>> if EmpFunction was a table-function. You could combine
the two
>> >> forms
>> >>>>> like
>> >>>>>>>> this:
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap))
EXTEND
>> >>>>>>>> (anotherAttribute INTEGER);
>> >>>>>>>>
>> >>>>>>>> We could revisit whether EXTEND is optional, I suppose.
But we
>> >>> should
>> >>>>>>> also
>> >>>>>>>> ask whether requiring folks to type TABLE is such
a hardship.
>> >>>>>>>>
>> >>>>>>>> Julian
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com>
>> >>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> - Table function syntax: I did a quick search
and it seems
>> >> there's
>> >>> no
>> >>>>>>>>> consensus about this.
>> >>>>>>>>> It seems that Posgres [1] and SQL Server [2]
both allow calling
>> >>> table
>> >>>>>>>>> functions without the table(...) wrapper while
Oracle [3] and DB2
>> >>> [4]
>> >>>>>>>>> expect it.
>> >>>>>>>>> MySQL does not have table functions [5]
>> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw
:)
>> >>>>>>>>> Would it be reasonable to allow a switch in
the grammar
>> >> generation
>> >>> to
>> >>>>>>>> have
>> >>>>>>>>> a posgres compatible syntax? Currently in Drill
we use the MySQL
>> >>> like
>> >>>>>>>>> syntax (back ticks for identifiers etc)
>> >>>>>>>>>
>> >>>>>>>>> [1]
>> >>>>>>>
>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>> >>>>>>>>> [2]
>> >>>>>>>
>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>> >>>>>>>>> [3]
>> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
>> >>>>>>>>> [4]
>> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>> >>>>>>>>> [5]
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>> >>>>>>>>>
>> >>>>>>>>> - It seems a simple change in SqlCallBinding
fixes the function
>> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>> >>>>>>>>> But that seems too easy to be true. Possibly
this method is
>> >> called
>> >>>>> more
>> >>>>>>>>> than once (before and after the function has
been resolved?)
>> >>>>>>>>>
>> >>>>>>>>> FYI this would happen only when using named
parameter. We do want
>> >>> to
>> >>>>>>>>> overload in this case, which is why I'm looking
into it.
>> >>>>>>>>>
>> >>>>>>>>> I'll fill a JIRA for my other branch
>> >>>>>>>>>
>> >>>>>>>>> Julien
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde
<jhyde@apache.org>
>> >>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem
<julien@dremio.com>
>> >>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> TL;DR: TableMacro works for me; I need help
with a bug in
>> >> Calcite
>> >>>>> when
>> >>>>>>>>>> there's more than 1 function with the same
name.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Yes; see below.
>> >>>>>>>>>>
>> >>>>>>>>>> FYI: I have a prototype of TableMacro working
in Drill. For now
>> >>> just
>> >>>>>>>> being
>> >>>>>>>>>> able to specify the delimiter for csv files.
>> >>>>>>>>>> So it seem the answer to my question 1)
is that TableMacros are
>> >>> the
>> >>>>>>> way
>> >>>>>>>> to
>> >>>>>>>>>> go.
>> >>>>>>>>>> I'm still wondering about *3) is the table(...)
wrapping syntax
>> >>>>>>>>>> necessary?*
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Consider:
>> >>>>>>>>>>
>> >>>>>>>>>> select * from myTable as f(x, y)
>> >>>>>>>>>> select * from myTable f(x, y)
>> >>>>>>>>>> select * from myFunction(x, y)
>> >>>>>>>>>>
>> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3
look awfully similar.
>> >>> Also,
>> >>>>>>> if
>> >>>>>>>> f
>> >>>>>>>>>> is a function with zero arguments, could
you invoke it like
>> >> this?:
>> >>>>>>>>>>
>> >>>>>>>>>> select * from f
>> >>>>>>>>>>
>> >>>>>>>>>> I don’t know the actual rationale. But
I know that the SQL
>> >>> standards
>> >>>>>>>>>> people in their wisdom decided to add a
keyword to disambiguate.
>> >>>>>>>>>>
>> >>>>>>>>>> I had to fix some things in Calcite to enable
this:
>> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>> >>>>>>>>>> Drill uses Frameworks.getPlanner() that
does not seem to be used
>> >>> in
>> >>>>>>>>>> Calcite for the Maze example.
>> >>>>>>>>>> Which is why some hooks were missing.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Can you log a jira case to track this bug?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I think I found a bug in Calcite but I'd
need help to fix it.
>> >>>>>>>>>> Here is a test that reproduces the problem:
>> >>>>>>>>>> https://github.com/apache/calcite/pull/166
>> >>>>>>>>>> If we return more than 1 TableFunction with
the same name, we
>> >> get
>> >>> a
>> >>>>>>> NPE
>> >>>>>>>>>> later on.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Yes, I knew there was a problem with overloading.
Please log a
>> >>> JIRA
>> >>>>>>> case
>> >>>>>>>>>> on resolution of overloaded functions when
invoked with named
>> >>>>>>> arguments.
>> >>>>>>>>>> (It probably applies to all functions, not
just table
>> >> functions.)
>> >>>>> The
>> >>>>>>>> fix
>> >>>>>>>>>> will take a while (if you wait for me to
write it).
>> >>>>>>>>>>
>> >>>>>>>>>> For now please tell your users not to overload.
:)
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Julian
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Julien
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> *Jim Scott*
>> >>>>>> Director, Enterprise Strategy & Architecture
>> >>>>>> +1 (347) 746-9281
>> >>>>>> @kingmesal <https://twitter.com/kingmesal>
>> >>>>>>
>> >>>>>> <http://www.mapr.com/>
>> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
>> >>>>>>
>> >>>>>> Now Available - Free Hadoop On-Demand Training
>> >>>>>> <
>> >>>>>
>> >>>
>> >>
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Julien
>>
>>
>
>
> --
> Julien

Mime
View raw message