drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: select from table with options
Date Tue, 10 Nov 2015 20:28:42 GMT
In the patch I just sent, probably not.
I will adjust it and add the corresponding test.

On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jhyde@apache.org> wrote:

> Can you use both together? Say
>
>   select columns
>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND
> (foo INTEGER)
>
> Julian
>
>
>
> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <julien@dremio.com> wrote:
> >
> > I took a stab at adding the TableFunction syntax without table(...) in
> > Calcite.
> > I have verified that both the table function and extend (with or without
> > keyword) work
> >
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >
> > These work:
> >
> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter
> =>
> > '|')
> >
> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> > fieldDelimiter => '|'))
> >
> > select columns from table(dfs.`/path/to/myfile`('JSON'))
> >
> > select columns from dfs.`/path/to/myfile`('JSON')
> >
> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >
> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <jacques@apache.org>
> wrote:
> >
> >> Drill does implicitly what Phoenix does explicitly so I don't think we
> >> should constrain ourselves to having a union of the two syntaxes.
> >>
> >>
> >> That being said, I think we could make these work together... maybe.
> >>
> >> Remove the EXTENDS without keyword syntax from the grammar.
> >>
> >> Create a new sub block in the table block that requires no keyword.
> There
> >> would be two paths (and would probably require some lookahead)
> >>
> >> option 1> unnamed parameters (1,2,3)
> >> option 2> named parameters (a => 1, b=>2, c=> 3)
> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >> golfHandicap INTEGER)
> >>
> >> Then we create a table function with options 1 & 2, an EXTENDS clause
> for
> >> option 3.
> >>
> >> Best of both worlds?
> >>
> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org>
> >> wrote:
> >>
> >>> Phoenix already supports columns at read-time using the syntax without
> >> the
> >>> EXTENDS keyword as Julian indicated:
> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>   WHERE goldHandicap < 10;
> >>>
> >>> Changing this by requiring the EXTENDS keyword would create a backward
> >>> compatibility problem.
> >>>
> >>> I think it'd be good if both of these extensions worked in Drill &
> >> Phoenix
> >>> given our Drillix initiative.
> >>>
> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com>
> >> wrote:
> >>>
> >>>> My proposal was an a or b using the freemarker template in the
> grammar,
> >>>> not something later.
> >>>>
> >>>> Actually, put another way: we may want to consider stating that we
> only
> >>>> incorporate SQL standards in our primary grammar. Any extensions
> should
> >>> be
> >>>> optional grammar. We could simply have grammar plugins in Calcite (the
> >>> same
> >>>> way we plug in external things in Drill).
> >>>>
> >>>> Trying to get every project to agree on extensions seems like it may
> be
> >>>> hard.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jacques Nadeau
> >>>> CTO and Co-Founder, Dremio
> >>>>
> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jhyde@apache.org>
wrote:
> >>>>
> >>>>> I can see why Jacques wants this syntax.
> >>>>>
> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to
be
> >>>>> predictable. Any variation should happen at validation time, or
> later.
> >>>>>
> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding
> a
> >>>>> tough design discussion.
> >>>>>
> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> >> they
> >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> >> (by
> >>>>> which I mean Jacques and James, I guess) need to agree what the
SQL
> >>> syntax
> >>>>> should be.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <jscott@maprtech.com>
wrote:
> >>>>>>
> >>>>>> Looking at those two examples I agree with Jacques. The first
> >> appears
> >>>>> more
> >>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Since EXTEND is custom functionality, it seems reasonable
that we
> >>> could
> >>>>>>> have a switch. Given that SQL Server and Postgres support
it seems
> >>>>>>> reasonable to support the table functions without the TABLE
syntax.
> >>>>>>>
> >>>>>>> I for one definitely think the TABLE syntax is much more
confusing
> >> to
> >>>>> use,
> >>>>>>> especially in the example that we're looking to support,
such as:
> >>>>>>>
> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >>> =>
> >>>>>>> '|', skipFirstRow => true)
> >>>>>>>
> >>>>>>> This seems much clearer than:
> >>>>>>>
> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type =>
'CSV',
> >>>>> fieldDelimiter
> >>>>>>> => '|', skipFirstRow => true))
> >>>>>>>
> >>>>>>> It also looks much more like a hint to the table (which
is our
> >> goal).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jacques Nadeau
> >>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>
> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jhyde@apache.org>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for doing the legwork and finding what the other
vendors
> >> do.
> >>>>> It is
> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond
the
> >>> standard
> >>>>> an
> >>>>>>>> make the TABLE keyword optional.
> >>>>>>>>
> >>>>>>>> I tried that syntax in Calcite and discovered that there
is a
> >> clash
> >>>>> with
> >>>>>>>> one of our own (few) extensions. In
> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we
added the
> >>>>> EXTENDS
> >>>>>>>> clause. You can write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> to tell Calcite that there are two undeclared columns
in the Emp
> >>> table
> >>>>>>> but
> >>>>>>>> you would like to use them in this particular query.
We chose to
> >>> make
> >>>>> the
> >>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> That is uncomfortably close to
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>
> >>>>>>>> so we would require
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>
> >>>>>>>> if EmpFunction was a table-function. You could combine
the two
> >> forms
> >>>>> like
> >>>>>>>> this:
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap))
EXTEND
> >>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>
> >>>>>>>> We could revisit whether EXTEND is optional, I suppose.
But we
> >>> should
> >>>>>>> also
> >>>>>>>> ask whether requiring folks to type TABLE is such a
hardship.
> >>>>>>>>
> >>>>>>>> Julian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>> - Table function syntax: I did a quick search and
it seems
> >> there's
> >>> no
> >>>>>>>>> consensus about this.
> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both
allow calling
> >>> table
> >>>>>>>>> functions without the table(...) wrapper while Oracle
[3] and DB2
> >>> [4]
> >>>>>>>>> expect it.
> >>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw
:)
> >>>>>>>>> Would it be reasonable to allow a switch in the
grammar
> >> generation
> >>> to
> >>>>>>>> have
> >>>>>>>>> a posgres compatible syntax? Currently in Drill
we use the MySQL
> >>> like
> >>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>
> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>> [2]
> >>>>>>>
> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>> [3]
> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>> [4]
> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>> [5]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>
> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes
the function
> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
> >>>>>>>>> But that seems too easy to be true. Possibly this
method is
> >> called
> >>>>> more
> >>>>>>>>> than once (before and after the function has been
resolved?)
> >>>>>>>>>
> >>>>>>>>> FYI this would happen only when using named parameter.
We do want
> >>> to
> >>>>>>>>> overload in this case, which is why I'm looking
into it.
> >>>>>>>>>
> >>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>
> >>>>>>>>> Julien
> >>>>>>>>>
> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> TL;DR: TableMacro works for me; I need help
with a bug in
> >> Calcite
> >>>>> when
> >>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes; see below.
> >>>>>>>>>>
> >>>>>>>>>> FYI: I have a prototype of TableMacro working
in Drill. For now
> >>> just
> >>>>>>>> being
> >>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>> So it seem the answer to my question 1) is that
TableMacros are
> >>> the
> >>>>>>> way
> >>>>>>>> to
> >>>>>>>>>> go.
> >>>>>>>>>> I'm still wondering about *3) is the table(...)
wrapping syntax
> >>>>>>>>>> necessary?*
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Consider:
> >>>>>>>>>>
> >>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>
> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look
awfully similar.
> >>> Also,
> >>>>>>> if
> >>>>>>>> f
> >>>>>>>>>> is a function with zero arguments, could you
invoke it like
> >> this?:
> >>>>>>>>>>
> >>>>>>>>>> select * from f
> >>>>>>>>>>
> >>>>>>>>>> I don’t know the actual rationale. But I know
that the SQL
> >>> standards
> >>>>>>>>>> people in their wisdom decided to add a keyword
to disambiguate.
> >>>>>>>>>>
> >>>>>>>>>> I had to fix some things in Calcite to enable
this:
> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does
not seem to be used
> >>> in
> >>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I think I found a bug in Calcite but I'd need
help to fix it.
> >>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>> If we return more than 1 TableFunction with
the same name, we
> >> get
> >>> a
> >>>>>>> NPE
> >>>>>>>>>> later on.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes, I knew there was a problem with overloading.
Please log a
> >>> JIRA
> >>>>>>> case
> >>>>>>>>>> on resolution of overloaded functions when invoked
with named
> >>>>>>> arguments.
> >>>>>>>>>> (It probably applies to all functions, not just
table
> >> functions.)
> >>>>> The
> >>>>>>>> fix
> >>>>>>>>>> will take a while (if you wait for me to write
it).
> >>>>>>>>>>
> >>>>>>>>>> For now please tell your users not to overload.
:)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Julian
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Julien
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Jim Scott*
> >>>>>> Director, Enterprise Strategy & Architecture
> >>>>>> +1 (347) 746-9281
> >>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>
> >>>>>> <http://www.mapr.com/>
> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>
> >>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>> <
> >>>>>
> >>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message