drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: select from table with options
Date Tue, 10 Nov 2015 19:51:30 GMT
Can you use both together? Say

  select columns
  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND (foo INTEGER)

Julian



> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <julien@dremio.com> wrote:
> 
> I took a stab at adding the TableFunction syntax without table(...) in
> Calcite.
> I have verified that both the table function and extend (with or without
> keyword) work
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> 
> These work:
> 
> select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
> '|')
> 
> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> fieldDelimiter => '|'))
> 
> select columns from table(dfs.`/path/to/myfile`('JSON'))
> 
> select columns from dfs.`/path/to/myfile`('JSON')
> 
> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> 
> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <jacques@apache.org> wrote:
> 
>> Drill does implicitly what Phoenix does explicitly so I don't think we
>> should constrain ourselves to having a union of the two syntaxes.
>> 
>> 
>> That being said, I think we could make these work together... maybe.
>> 
>> Remove the EXTENDS without keyword syntax from the grammar.
>> 
>> Create a new sub block in the table block that requires no keyword. There
>> would be two paths (and would probably require some lookahead)
>> 
>> option 1> unnamed parameters (1,2,3)
>> option 2> named parameters (a => 1, b=>2, c=> 3)
>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>> golfHandicap INTEGER)
>> 
>> Then we create a table function with options 1 & 2, an EXTENDS clause for
>> option 3.
>> 
>> Best of both worlds?
>> 
>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org>
>> wrote:
>> 
>>> Phoenix already supports columns at read-time using the syntax without
>> the
>>> EXTENDS keyword as Julian indicated:
>>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>   WHERE goldHandicap < 10;
>>> 
>>> Changing this by requiring the EXTENDS keyword would create a backward
>>> compatibility problem.
>>> 
>>> I think it'd be good if both of these extensions worked in Drill &
>> Phoenix
>>> given our Drillix initiative.
>>> 
>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com>
>> wrote:
>>> 
>>>> My proposal was an a or b using the freemarker template in the grammar,
>>>> not something later.
>>>> 
>>>> Actually, put another way: we may want to consider stating that we only
>>>> incorporate SQL standards in our primary grammar. Any extensions should
>>> be
>>>> optional grammar. We could simply have grammar plugins in Calcite (the
>>> same
>>>> way we plug in external things in Drill).
>>>> 
>>>> Trying to get every project to agree on extensions seems like it may be
>>>> hard.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>> 
>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jhyde@apache.org> wrote:
>>>> 
>>>>> I can see why Jacques wants this syntax.
>>>>> 
>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>> predictable. Any variation should happen at validation time, or later.
>>>>> 
>>>>> Also, we shouldn’t add configuration parameters as a way of avoiding
a
>>>>> tough design discussion.
>>>>> 
>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>> they
>>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
>> (by
>>>>> which I mean Jacques and James, I guess) need to agree what the SQL
>>> syntax
>>>>> should be.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <jscott@maprtech.com>
wrote:
>>>>>> 
>>>>>> Looking at those two examples I agree with Jacques. The first
>> appears
>>>>> more
>>>>>> like a hint from the syntactic sugar point of view.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
we
>>> could
>>>>>>> have a switch. Given that SQL Server and Postgres support it
seems
>>>>>>> reasonable to support the table functions without the TABLE syntax.
>>>>>>> 
>>>>>>> I for one definitely think the TABLE syntax is much more confusing
>> to
>>>>> use,
>>>>>>> especially in the example that we're looking to support, such
as:
>>>>>>> 
>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>>> =>
>>>>>>> '|', skipFirstRow => true)
>>>>>>> 
>>>>>>> This seems much clearer than:
>>>>>>> 
>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>> fieldDelimiter
>>>>>>> => '|', skipFirstRow => true))
>>>>>>> 
>>>>>>> It also looks much more like a hint to the table (which is our
>> goal).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jacques Nadeau
>>>>>>> CTO and Co-Founder, Dremio
>>>>>>> 
>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jhyde@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>> do.
>>>>> It is
>>>>>>>> indeed compelling that SQL Server and Postgres go beyond
the
>>> standard
>>>>> an
>>>>>>>> make the TABLE keyword optional.
>>>>>>>> 
>>>>>>>> I tried that syntax in Calcite and discovered that there
is a
>> clash
>>>>> with
>>>>>>>> one of our own (few) extensions. In
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
the
>>>>> EXTENDS
>>>>>>>> clause. You can write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> to tell Calcite that there are two undeclared columns in
the Emp
>>> table
>>>>>>> but
>>>>>>>> you would like to use them in this particular query. We chose
to
>>> make
>>>>> the
>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> That is uncomfortably close to
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>> 
>>>>>>>> so we would require
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>> 
>>>>>>>> if EmpFunction was a table-function. You could combine the
two
>> forms
>>>>> like
>>>>>>>> this:
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>> 
>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But
we
>>> should
>>>>>>> also
>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> - Table function syntax: I did a quick search and it
seems
>> there's
>>> no
>>>>>>>>> consensus about this.
>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
calling
>>> table
>>>>>>>>> functions without the table(...) wrapper while Oracle
[3] and DB2
>>> [4]
>>>>>>>>> expect it.
>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>> generation
>>> to
>>>>>>>> have
>>>>>>>>> a posgres compatible syntax? Currently in Drill we use
the MySQL
>>> like
>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>> 
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>> [2]
>>>>>>> 
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>> [3]
>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>> [4]
>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>> [5]
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>> 
>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
function
>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>> But that seems too easy to be true. Possibly this method
is
>> called
>>>>> more
>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>> 
>>>>>>>>> FYI this would happen only when using named parameter.
We do want
>>> to
>>>>>>>>> overload in this case, which is why I'm looking into
it.
>>>>>>>>> 
>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>> 
>>>>>>>>> Julien
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> TL;DR: TableMacro works for me; I need help with
a bug in
>> Calcite
>>>>> when
>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes; see below.
>>>>>>>>>> 
>>>>>>>>>> FYI: I have a prototype of TableMacro working in
Drill. For now
>>> just
>>>>>>>> being
>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
are
>>> the
>>>>>>> way
>>>>>>>> to
>>>>>>>>>> go.
>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
syntax
>>>>>>>>>> necessary?*
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Consider:
>>>>>>>>>> 
>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>> 
>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
similar.
>>> Also,
>>>>>>> if
>>>>>>>> f
>>>>>>>>>> is a function with zero arguments, could you invoke
it like
>> this?:
>>>>>>>>>> 
>>>>>>>>>> select * from f
>>>>>>>>>> 
>>>>>>>>>> I don’t know the actual rationale. But I know that
the SQL
>>> standards
>>>>>>>>>> people in their wisdom decided to add a keyword to
disambiguate.
>>>>>>>>>> 
>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not
seem to be used
>>> in
>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I think I found a bug in Calcite but I'd need help
to fix it.
>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>> If we return more than 1 TableFunction with the same
name, we
>> get
>>> a
>>>>>>> NPE
>>>>>>>>>> later on.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes, I knew there was a problem with overloading.
Please log a
>>> JIRA
>>>>>>> case
>>>>>>>>>> on resolution of overloaded functions when invoked
with named
>>>>>>> arguments.
>>>>>>>>>> (It probably applies to all functions, not just table
>> functions.)
>>>>> The
>>>>>>>> fix
>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>> 
>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Julien
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *Jim Scott*
>>>>>> Director, Enterprise Strategy & Architecture
>>>>>> +1 (347) 746-9281
>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>> 
>>>>>> <http://www.mapr.com/>
>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>> 
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> <
>>>>> 
>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Julien


Mime
View raw message