drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: flatten() function, scalar functions, nested ?
Date Mon, 12 Oct 2015 21:57:01 GMT
It's funny, I've been looking at this stuff recently in the thinking about
trying to fit Flatten into Calcite's UDTF construct.

My inclination is that we basically have sugared the syntax for a lateral
join with a udtf. I think this is a useful syntax when restricted to the
select clause and think we should make it more general to all udtf's. What
do others think? I think we should also add support for the more
traditional syntax structure.

Have you looked at Calcite's planning of lateral join and UDTFs as we may
be able to push this sugared syntax into a clearer pattern.

With regards to not allowing expression trees with flatten, I'm not sure
that is a required constraint.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Oct 12, 2015 at 2:44 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:

> Drill currently supports flatten(), which flattens elements in a
> repeated field into a set of individual records [1].
>
> My understanding is that flatten is not part of SQL standard (SQL
> standard uses UNNEST). It's a concept borrowed from Google's
> BigQuery[2]. However, the way Drill uses flatten seems to be
> "different" from the way flatten is used in BiqQuery, or UNNEST in
> SQL.
>
> 1) In Drill, flatten is more like a regular function; it could appear
> in SELECT clause, ORDER BY, GROUP BY, but NOT in FROM clause.
>     In contrast, in BigQuery, flatten seems to only appear in the FROM
> clause, in a way more like table function.
>
> 2) Because Drill treats flatten like a regular function, it could be
> nested with a regular scalar function. I could not find such usage in
> BigQuery (Please point me an example if someone finds an example).
>
> As such, Drill currently allows the following:
>
> Assuming (T1 has one row only and T1.intList  : [ 1, 2, 3] )
>
> Q1:
> select flatten(T1.intList) + 1
> from
> T1;
>
> Would return 3 rows:
> 2,
> 3,
> 4
>
> To me, it is kind of unusual usage to 1) allow flatten to appear
> anywhere where a regular scalar function would appear, 2) allow
> flatten to be nested with regular scalar function.
>
> Such "flexibility"  probably has not been well tested. For instance,
> if I put flatten() in WHERE clause, Drill would throw function
> resolution error.
>
> Also, I'm not sure how Drill would do if we put flatten() nested with
> regular scalar function inside window aggregate function, with clause
> (CTE), etc?
>
> My question is : should Drill restrict flatten such that
>
> 1) it will only appear in SELECT clause for now (I understand it would
> require significant re-work if we want to use BiqQuery way and use
> flatten in FROM clause).
> 2) it would not allow flatten to be nested with regular scalar function.
>
> With such restriction, the above Q1 could be rewritten into:
>
> select  f + 1
> from ( select flatten(T1.intList) as f
>            from T1)
>
> I understand there is some discussion to improve flatten() in some
> JIRAs.  What does other think about adding this restriction to
> flatten?
>
>
> [1]. https://drill.apache.org/docs/flatten/
> [2]. https://cloud.google.com/bigquery/docs/data#flatten
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message