drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: select from table with options
Date Tue, 10 Nov 2015 18:51:58 GMT
I took a stab at adding the TableFunction syntax without table(...) in
Calcite.
I have verified that both the table function and extend (with or without
keyword) work
https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34

These work:

select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
'|')

select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
fieldDelimiter => '|'))

select columns from table(dfs.`/path/to/myfile`('JSON'))

select columns from dfs.`/path/to/myfile`('JSON')

select columns from dfs.`/path/to/myfile`(type => 'JSON')

On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <jacques@apache.org> wrote:

> Drill does implicitly what Phoenix does explicitly so I don't think we
> should constrain ourselves to having a union of the two syntaxes.
>
>
> That being said, I think we could make these work together... maybe.
>
> Remove the EXTENDS without keyword syntax from the grammar.
>
> Create a new sub block in the table block that requires no keyword. There
> would be two paths (and would probably require some lookahead)
>
> option 1> unnamed parameters (1,2,3)
> option 2> named parameters (a => 1, b=>2, c=> 3)
> option 3> create table field pattern (favoriteBand VARCHAR(100),
> golfHandicap INTEGER)
>
> Then we create a table function with options 1 & 2, an EXTENDS clause for
> option 3.
>
> Best of both worlds?
>
> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org>
> wrote:
>
> > Phoenix already supports columns at read-time using the syntax without
> the
> > EXTENDS keyword as Julian indicated:
> >    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >    WHERE goldHandicap < 10;
> >
> > Changing this by requiring the EXTENDS keyword would create a backward
> > compatibility problem.
> >
> > I think it'd be good if both of these extensions worked in Drill &
> Phoenix
> > given our Drillix initiative.
> >
> > On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com>
> wrote:
> >
> > > My proposal was an a or b using the freemarker template in the grammar,
> > > not something later.
> > >
> > > Actually, put another way: we may want to consider stating that we only
> > > incorporate SQL standards in our primary grammar. Any extensions should
> > be
> > > optional grammar. We could simply have grammar plugins in Calcite (the
> > same
> > > way we plug in external things in Drill).
> > >
> > > Trying to get every project to agree on extensions seems like it may be
> > > hard.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jhyde@apache.org> wrote:
> > >
> > >> I can see why Jacques wants this syntax.
> > >>
> > >> However a “switch" in a grammar is a bad idea. Grammars need to be
> > >> predictable. Any variation should happen at validation time, or later.
> > >>
> > >> Also, we shouldn’t add configuration parameters as a way of avoiding
a
> > >> tough design discussion.
> > >>
> > >> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> they
> > >> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> (by
> > >> which I mean Jacques and James, I guess) need to agree what the SQL
> > syntax
> > >> should be.
> > >>
> > >> Julian
> > >>
> > >>
> > >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <jscott@maprtech.com>
wrote:
> > >> >
> > >> > Looking at those two examples I agree with Jacques. The first
> appears
> > >> more
> > >> > like a hint from the syntactic sugar point of view.
> > >> >
> > >> >
> > >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >
> > >> wrote:
> > >> >
> > >> >> Since EXTEND is custom functionality, it seems reasonable that
we
> > could
> > >> >> have a switch. Given that SQL Server and Postgres support it seems
> > >> >> reasonable to support the table functions without the TABLE syntax.
> > >> >>
> > >> >> I for one definitely think the TABLE syntax is much more confusing
> to
> > >> use,
> > >> >> especially in the example that we're looking to support, such
as:
> > >> >>
> > >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> fieldDelimiter
> > =>
> > >> >> '|', skipFirstRow => true)
> > >> >>
> > >> >> This seems much clearer than:
> > >> >>
> > >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> > >> fieldDelimiter
> > >> >> => '|', skipFirstRow => true))
> > >> >>
> > >> >> It also looks much more like a hint to the table (which is our
> goal).
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Jacques Nadeau
> > >> >> CTO and Co-Founder, Dremio
> > >> >>
> > >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jhyde@apache.org>
> > wrote:
> > >> >>
> > >> >>> Thanks for doing the legwork and finding what the other vendors
> do.
> > >> It is
> > >> >>> indeed compelling that SQL Server and Postgres go beyond the
> > standard
> > >> an
> > >> >>> make the TABLE keyword optional.
> > >> >>>
> > >> >>> I tried that syntax in Calcite and discovered that there is
a
> clash
> > >> with
> > >> >>> one of our own (few) extensions. In
> > >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added
the
> > >> EXTENDS
> > >> >>> clause. You can write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> to tell Calcite that there are two undeclared columns in the
Emp
> > table
> > >> >> but
> > >> >>> you would like to use them in this particular query. We chose
to
> > make
> > >> the
> > >> >>> EXTEND keyword optional, so you could instead write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> That is uncomfortably close to
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> > >> >>>
> > >> >>> so we would require
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> > >> >>>
> > >> >>> if EmpFunction was a table-function. You could combine the
two
> forms
> > >> like
> > >> >>> this:
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > >> >>> (anotherAttribute INTEGER);
> > >> >>>
> > >> >>> We could revisit whether EXTEND is optional, I suppose. But
we
> > should
> > >> >> also
> > >> >>> ask whether requiring folks to type TABLE is such a hardship.
> > >> >>>
> > >> >>> Julian
> > >> >>>
> > >> >>>
> > >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com>
> > wrote:
> > >> >>>>
> > >> >>>> - Table function syntax: I did a quick search and it seems
> there's
> > no
> > >> >>>> consensus about this.
> > >> >>>> It seems that Posgres [1] and SQL Server [2] both allow
calling
> > table
> > >> >>>> functions without the table(...) wrapper while Oracle
[3] and DB2
> > [4]
> > >> >>>> expect it.
> > >> >>>> MySQL does not have table functions [5]
> > >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> > >> >>>> Would it be reasonable to allow a switch in the grammar
> generation
> > to
> > >> >>> have
> > >> >>>> a posgres compatible syntax? Currently in Drill we use
the MySQL
> > like
> > >> >>>> syntax (back ticks for identifiers etc)
> > >> >>>>
> > >> >>>> [1]
> > >> >>
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > >> >>>> [2]
> > >> >>
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > >> >>>> [3]
> > https://oracle-base.com/articles/misc/pipelined-table-functions
> > >> >>>> [4]
> > http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > >> >>>> [5]
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >> >>>>
> > >> >>>> - It seems a simple change in SqlCallBinding fixes the
function
> > >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> > >> >>>> But that seems too easy to be true. Possibly this method
is
> called
> > >> more
> > >> >>>> than once (before and after the function has been resolved?)
> > >> >>>>
> > >> >>>> FYI this would happen only when using named parameter.
We do want
> > to
> > >> >>>> overload in this case, which is why I'm looking into it.
> > >> >>>>
> > >> >>>> I'll fill a JIRA for my other branch
> > >> >>>>
> > >> >>>> Julien
> > >> >>>>
> > >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org>
> > >> wrote:
> > >> >>>>
> > >> >>>>>
> > >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com>
> > >> wrote:
> > >> >>>>>
> > >> >>>>> TL;DR: TableMacro works for me; I need help with a
bug in
> Calcite
> > >> when
> > >> >>>>> there's more than 1 function with the same name.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes; see below.
> > >> >>>>>
> > >> >>>>> FYI: I have a prototype of TableMacro working in Drill.
For now
> > just
> > >> >>> being
> > >> >>>>> able to specify the delimiter for csv files.
> > >> >>>>> So it seem the answer to my question 1) is that TableMacros
are
> > the
> > >> >> way
> > >> >>> to
> > >> >>>>> go.
> > >> >>>>> I'm still wondering about *3) is the table(...) wrapping
syntax
> > >> >>>>> necessary?*
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Consider:
> > >> >>>>>
> > >> >>>>> select * from myTable as f(x, y)
> > >> >>>>> select * from myTable f(x, y)
> > >> >>>>> select * from myFunction(x, y)
> > >> >>>>>
> > >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
similar.
> > Also,
> > >> >> if
> > >> >>> f
> > >> >>>>> is a function with zero arguments, could you invoke
it like
> this?:
> > >> >>>>>
> > >> >>>>> select * from f
> > >> >>>>>
> > >> >>>>> I don’t know the actual rationale. But I know that
the SQL
> > standards
> > >> >>>>> people in their wisdom decided to add a keyword to
disambiguate.
> > >> >>>>>
> > >> >>>>> I had to fix some things in Calcite to enable this:
> > >> >>>>> https://github.com/dremio/calcite/pull/1/files
> > >> >>>>> Drill uses Frameworks.getPlanner() that does not seem
to be used
> > in
> > >> >>>>> Calcite for the Maze example.
> > >> >>>>> Which is why some hooks were missing.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Can you log a jira case to track this bug?
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> I think I found a bug in Calcite but I'd need help
to fix it.
> > >> >>>>> Here is a test that reproduces the problem:
> > >> >>>>> https://github.com/apache/calcite/pull/166
> > >> >>>>> If we return more than 1 TableFunction with the same
name, we
> get
> > a
> > >> >> NPE
> > >> >>>>> later on.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes, I knew there was a problem with overloading.
Please log a
> > JIRA
> > >> >> case
> > >> >>>>> on resolution of overloaded functions when invoked
with named
> > >> >> arguments.
> > >> >>>>> (It probably applies to all functions, not just table
> functions.)
> > >> The
> > >> >>> fix
> > >> >>>>> will take a while (if you wait for me to write it).
> > >> >>>>>
> > >> >>>>> For now please tell your users not to overload. :)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Julian
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> --
> > >> >>>> Julien
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > *Jim Scott*
> > >> > Director, Enterprise Strategy & Architecture
> > >> > +1 (347) 746-9281
> > >> > @kingmesal <https://twitter.com/kingmesal>
> > >> >
> > >> > <http://www.mapr.com/>
> > >> > [image: MapR Technologies] <http://www.mapr.com>
> > >> >
> > >> > Now Available - Free Hadoop On-Demand Training
> > >> > <
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> >
> > >>
> > >>
> > >
> >
>



-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message