drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: select from table with options
Date Sun, 08 Nov 2015 00:44:24 GMT
Phoenix already supports columns at read-time using the syntax without the
EXTENDS keyword as Julian indicated:
   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
   WHERE goldHandicap < 10;

Changing this by requiring the EXTENDS keyword would create a backward
compatibility problem.

I think it'd be good if both of these extensions worked in Drill & Phoenix
given our Drillix initiative.

On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com> wrote:

> My proposal was an a or b using the freemarker template in the grammar,
> not something later.
>
> Actually, put another way: we may want to consider stating that we only
> incorporate SQL standards in our primary grammar. Any extensions should be
> optional grammar. We could simply have grammar plugins in Calcite (the same
> way we plug in external things in Drill).
>
> Trying to get every project to agree on extensions seems like it may be
> hard.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jhyde@apache.org> wrote:
>
>> I can see why Jacques wants this syntax.
>>
>> However a “switch" in a grammar is a bad idea. Grammars need to be
>> predictable. Any variation should happen at validation time, or later.
>>
>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>> tough design discussion.
>>
>> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
>> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
>> which I mean Jacques and James, I guess) need to agree what the SQL syntax
>> should be.
>>
>> Julian
>>
>>
>> > On Nov 7, 2015, at 10:40 AM, Jim Scott <jscott@maprtech.com> wrote:
>> >
>> > Looking at those two examples I agree with Jacques. The first appears
>> more
>> > like a hint from the syntactic sugar point of view.
>> >
>> >
>> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com>
>> wrote:
>> >
>> >> Since EXTEND is custom functionality, it seems reasonable that we could
>> >> have a switch. Given that SQL Server and Postgres support it seems
>> >> reasonable to support the table functions without the TABLE syntax.
>> >>
>> >> I for one definitely think the TABLE syntax is much more confusing to
>> use,
>> >> especially in the example that we're looking to support, such as:
>> >>
>> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
=>
>> >> '|', skipFirstRow => true)
>> >>
>> >> This seems much clearer than:
>> >>
>> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>> >> => '|', skipFirstRow => true))
>> >>
>> >> It also looks much more like a hint to the table (which is our goal).
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Jacques Nadeau
>> >> CTO and Co-Founder, Dremio
>> >>
>> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jhyde@apache.org> wrote:
>> >>
>> >>> Thanks for doing the legwork and finding what the other vendors do.
>> It is
>> >>> indeed compelling that SQL Server and Postgres go beyond the standard
>> an
>> >>> make the TABLE keyword optional.
>> >>>
>> >>> I tried that syntax in Calcite and discovered that there is a clash
>> with
>> >>> one of our own (few) extensions. In
>> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>> EXTENDS
>> >>> clause. You can write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> to tell Calcite that there are two undeclared columns in the Emp table
>> >> but
>> >>> you would like to use them in this particular query. We chose to make
>> the
>> >>> EXTEND keyword optional, so you could instead write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> That is uncomfortably close to
>> >>>
>> >>>  SELECT *
>> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
>> >>>
>> >>> so we would require
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>> >>>
>> >>> if EmpFunction was a table-function. You could combine the two forms
>> like
>> >>> this:
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>> >>> (anotherAttribute INTEGER);
>> >>>
>> >>> We could revisit whether EXTEND is optional, I suppose. But we should
>> >> also
>> >>> ask whether requiring folks to type TABLE is such a hardship.
>> >>>
>> >>> Julian
>> >>>
>> >>>
>> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com>
wrote:
>> >>>>
>> >>>> - Table function syntax: I did a quick search and it seems there's
no
>> >>>> consensus about this.
>> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
table
>> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
[4]
>> >>>> expect it.
>> >>>> MySQL does not have table functions [5]
>> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
>> >>>> Would it be reasonable to allow a switch in the grammar generation
to
>> >>> have
>> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
like
>> >>>> syntax (back ticks for identifiers etc)
>> >>>>
>> >>>> [1]
>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>> >>>> [2]
>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>> >>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>> >>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>> >>>> [5]
>> >>>>
>> >>>
>> >>
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>> >>>>
>> >>>> - It seems a simple change in SqlCallBinding fixes the function
>> >>>> overloading: https://github.com/apache/calcite/pull/166/files
>> >>>> But that seems too easy to be true. Possibly this method is called
>> more
>> >>>> than once (before and after the function has been resolved?)
>> >>>>
>> >>>> FYI this would happen only when using named parameter. We do want
to
>> >>>> overload in this case, which is why I'm looking into it.
>> >>>>
>> >>>> I'll fill a JIRA for my other branch
>> >>>>
>> >>>> Julien
>> >>>>
>> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org>
>> wrote:
>> >>>>
>> >>>>>
>> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com>
>> wrote:
>> >>>>>
>> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
>> when
>> >>>>> there's more than 1 function with the same name.
>> >>>>>
>> >>>>>
>> >>>>> Yes; see below.
>> >>>>>
>> >>>>> FYI: I have a prototype of TableMacro working in Drill. For
now just
>> >>> being
>> >>>>> able to specify the delimiter for csv files.
>> >>>>> So it seem the answer to my question 1) is that TableMacros
are the
>> >> way
>> >>> to
>> >>>>> go.
>> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>> >>>>> necessary?*
>> >>>>>
>> >>>>>
>> >>>>> Consider:
>> >>>>>
>> >>>>> select * from myTable as f(x, y)
>> >>>>> select * from myTable f(x, y)
>> >>>>> select * from myFunction(x, y)
>> >>>>>
>> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
Also,
>> >> if
>> >>> f
>> >>>>> is a function with zero arguments, could you invoke it like
this?:
>> >>>>>
>> >>>>> select * from f
>> >>>>>
>> >>>>> I don’t know the actual rationale. But I know that the SQL
standards
>> >>>>> people in their wisdom decided to add a keyword to disambiguate.
>> >>>>>
>> >>>>> I had to fix some things in Calcite to enable this:
>> >>>>> https://github.com/dremio/calcite/pull/1/files
>> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be
used in
>> >>>>> Calcite for the Maze example.
>> >>>>> Which is why some hooks were missing.
>> >>>>>
>> >>>>>
>> >>>>> Can you log a jira case to track this bug?
>> >>>>>
>> >>>>>
>> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
>> >>>>> Here is a test that reproduces the problem:
>> >>>>> https://github.com/apache/calcite/pull/166
>> >>>>> If we return more than 1 TableFunction with the same name, we
get a
>> >> NPE
>> >>>>> later on.
>> >>>>>
>> >>>>>
>> >>>>> Yes, I knew there was a problem with overloading. Please log
a JIRA
>> >> case
>> >>>>> on resolution of overloaded functions when invoked with named
>> >> arguments.
>> >>>>> (It probably applies to all functions, not just table functions.)
>> The
>> >>> fix
>> >>>>> will take a while (if you wait for me to write it).
>> >>>>>
>> >>>>> For now please tell your users not to overload. :)
>> >>>>>
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Julien
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > *Jim Scott*
>> > Director, Enterprise Strategy & Architecture
>> > +1 (347) 746-9281
>> > @kingmesal <https://twitter.com/kingmesal>
>> >
>> > <http://www.mapr.com/>
>> > [image: MapR Technologies] <http://www.mapr.com>
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message