drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: flatten() function, scalar functions, nested ?
Date Mon, 12 Oct 2015 23:55:22 GMT
The first 3 items is same as what I would think of flatten, if we want
to Drill to perform in a SQL way.

The last item is something that drives me to send the first email.
That is, whether Drill should allow the flexibility of such shortcut,
not only in SELECT clause, but also in other places including
OB/GB/WHERE etc.

Adding one layer of sql rewrite probably could get the last item go
through Calcite, although I'm not clear how to achieve that in Calcite
as an extension.  For instance, where should such rewrite happen,
right after sql validation, but before sql2rel converter?

The thing is, for now such rewrite happens after Calcite, which seems
to cause various of problem. That's, in some case, such shortcut works
perfectly, while in other cases, such shortcut does not work. My
original thinking is whether it makes sense to block such shortcut
until we could implement the items you suggest.



On Mon, Oct 12, 2015 at 3:42 PM, Jacques Nadeau <jacques@dremio.com> wrote:
> I was recommending the following:
>
> - flatten is table function
> - we should support it in traditional table function syntax
> - we should support lateral join/lateral subqueries
> - we have shortcut for a lateral join combined with a table function used
> in the select clause
>
> The last item could be done as some kind of sql rewrite. Either add support
> directly to Calcite or as an extension. We should have never forked Calcite
> (my mistake) and need to get back onto master. I don't see why this
> suggests otherwise.
>
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Oct 12, 2015 at 3:27 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
>
>> The first question I have is what is flatten() function. Is it table
>> functions or regular function?
>>
>> If Drill wants to model flatten() as table function, and want to fit
>> flatten() into Calcite's table function structure, then I guess we had
>> better add such restriction.
>>
>> Seems to me Julian's on-going work for table function is similar to
>> what I understand for table function. 1) it appears in FROM clause, 2)
>> As such, it would not nest a table function with a regular function.
>>
>> By "lateral join", are you referring to lateral subqueries [1]?
>> Lateral subqueries seems to also appear in FROM clause.
>>
>> So, if we want Drill to allow the sugared syntax for flatten, but also
>> want to use Calcite's UDTF, my sense is that it is yet another reason
>> for us to use forked Calcite, simply because the way Drill wants is
>> different from what normal SQL planner would do.
>>
>>
>>
>> [1].
>> http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html
>>
>> On Mon, Oct 12, 2015 at 2:57 PM, Jacques Nadeau <jacques@dremio.com>
>> wrote:
>> > It's funny, I've been looking at this stuff recently in the thinking
>> about
>> > trying to fit Flatten into Calcite's UDTF construct.
>> >
>> > My inclination is that we basically have sugared the syntax for a lateral
>> > join with a udtf. I think this is a useful syntax when restricted to the
>> > select clause and think we should make it more general to all udtf's.
>> What
>> > do others think? I think we should also add support for the more
>> > traditional syntax structure.
>> >
>> > Have you looked at Calcite's planning of lateral join and UDTFs as we may
>> > be able to push this sugared syntax into a clearer pattern.
>> >
>> > With regards to not allowing expression trees with flatten, I'm not sure
>> > that is a required constraint.
>> >
>> > --
>> > Jacques Nadeau
>> > CTO and Co-Founder, Dremio
>> >
>> > On Mon, Oct 12, 2015 at 2:44 PM, Jinfeng Ni <jinfengni99@gmail.com>
>> wrote:
>> >
>> >> Drill currently supports flatten(), which flattens elements in a
>> >> repeated field into a set of individual records [1].
>> >>
>> >> My understanding is that flatten is not part of SQL standard (SQL
>> >> standard uses UNNEST). It's a concept borrowed from Google's
>> >> BigQuery[2]. However, the way Drill uses flatten seems to be
>> >> "different" from the way flatten is used in BiqQuery, or UNNEST in
>> >> SQL.
>> >>
>> >> 1) In Drill, flatten is more like a regular function; it could appear
>> >> in SELECT clause, ORDER BY, GROUP BY, but NOT in FROM clause.
>> >>     In contrast, in BigQuery, flatten seems to only appear in the FROM
>> >> clause, in a way more like table function.
>> >>
>> >> 2) Because Drill treats flatten like a regular function, it could be
>> >> nested with a regular scalar function. I could not find such usage in
>> >> BigQuery (Please point me an example if someone finds an example).
>> >>
>> >> As such, Drill currently allows the following:
>> >>
>> >> Assuming (T1 has one row only and T1.intList  : [ 1, 2, 3] )
>> >>
>> >> Q1:
>> >> select flatten(T1.intList) + 1
>> >> from
>> >> T1;
>> >>
>> >> Would return 3 rows:
>> >> 2,
>> >> 3,
>> >> 4
>> >>
>> >> To me, it is kind of unusual usage to 1) allow flatten to appear
>> >> anywhere where a regular scalar function would appear, 2) allow
>> >> flatten to be nested with regular scalar function.
>> >>
>> >> Such "flexibility"  probably has not been well tested. For instance,
>> >> if I put flatten() in WHERE clause, Drill would throw function
>> >> resolution error.
>> >>
>> >> Also, I'm not sure how Drill would do if we put flatten() nested with
>> >> regular scalar function inside window aggregate function, with clause
>> >> (CTE), etc?
>> >>
>> >> My question is : should Drill restrict flatten such that
>> >>
>> >> 1) it will only appear in SELECT clause for now (I understand it would
>> >> require significant re-work if we want to use BiqQuery way and use
>> >> flatten in FROM clause).
>> >> 2) it would not allow flatten to be nested with regular scalar function.
>> >>
>> >> With such restriction, the above Q1 could be rewritten into:
>> >>
>> >> select  f + 1
>> >> from ( select flatten(T1.intList) as f
>> >>            from T1)
>> >>
>> >> I understand there is some discussion to improve flatten() in some
>> >> JIRAs.  What does other think about adding this restriction to
>> >> flatten?
>> >>
>> >>
>> >> [1]. https://drill.apache.org/docs/flatten/
>> >> [2]. https://cloud.google.com/bigquery/docs/data#flatten
>> >>
>>

Mime
View raw message