drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3652) Need to document order of operations with window functions and flatten
Date Fri, 14 Aug 2015 22:38:46 GMT
Victoria Markman created DRILL-3652:
---------------------------------------

             Summary: Need to document order of operations with window functions and flatten
                 Key: DRILL-3652
                 URL: https://issues.apache.org/jira/browse/DRILL-3652
             Project: Apache Drill
          Issue Type: Improvement
          Components: Documentation
    Affects Versions: 1.2.0
            Reporter: Victoria Markman
            Assignee: Bridget Bevens


In standard SQL, window functions are the last set of operations performed in a query except
for the final order by clause. 
Using window function with flatten is a bit confusing, because it appears as an operator in
the query plan and I expected flatten to run first followed by a window function.

This is not what is happening:
{code}
0: jdbc:drill:schema=dfs> select * from `complex.json`;
+----+-----------+----------+
| x  |     y     |    z     |
+----+-----------+----------+
| 5  | a string  | [1,2,3]  |
+----+-----------+----------+
1 row selected (0.128 seconds)

0: jdbc:drill:schema=dfs> select sum(x) over(), x , y, flatten(z) from `complex.json`;
+---------+----+-----------+---------+
| EXPR$0  | x  |     y     | EXPR$3  |
+---------+----+-----------+---------+
| 5       | 5  | a string  | 1       |
| 5       | 5  | a string  | 2       |
| 5       | 5  | a string  | 3       |
+---------+----+-----------+---------+
3 rows selected (0.152 seconds)

0: jdbc:drill:schema=dfs> explain plan for select sum(x) over(), x , y, flatten(z) from
`complex.json`;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      ProjectAllowDup(EXPR$0=[$0], x=[$1], y=[$2], EXPR$3=[$3])
00-02        Project(w0$o0=[$3], x=[$0], y=[$1], EXPR$3=[$4])
00-03          Flatten(flattenField=[$4])
00-04            Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3], EXPR$5=[$2])
00-05              Project(x=[$1], y=[$2], z=[$3], w0$o0=[$4])
00-06                Window(window#0=[window(partition {} order by [] range between UNBOUNDED
PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])])
00-07                  Project(T38¦¦*=[$0], x=[$1], y=[$2], z=[$3])
00-08                    Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/subqueries/complex.json,
numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/subqueries/complex.json]]]
{code}

We should suggest to users to put flatten in a subquery if they want to run window function
on top of the result set returned by flatten.

{code}
0: jdbc:drill:schema=dfs> select x, y, a, sum(x) over() from  ( select x , y, flatten(z)
as a from `complex.json`);
+----+-----------+----+---------+
| x  |     y     | a  | EXPR$3  |
+----+-----------+----+---------+
| 5  | a string  | 1  | 15      |
| 5  | a string  | 2  | 15      |
| 5  | a string  | 3  | 15      |
+----+-----------+----+---------+
3 rows selected (0.145 seconds)
{code}

I suggest we document this issue in the window function section, perhaps in "Usage notes".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message