drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Hsuan-Yi Chu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3567) Wrong result in a query with multiple window functions and different over clauses
Date Wed, 29 Jul 2015 22:45:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646873#comment-14646873
] 

Sean Hsuan-Yi Chu commented on DRILL-3567:
------------------------------------------

The cause resides in Calcite (will file an issue shortly). 

The reason is that, in [~vicky]'s original query, the second column has its own window while
the other two share the same one. Since that window is shared, Calcite's LogicalWindow puts
these two window functions together, which unfortunately changes the order.

The solution is to let the LogicalProject above be aware of that. 

Let me try to work on it.

> Wrong result in a query with multiple window functions and different over clauses
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-3567
>                 URL: https://issues.apache.org/jira/browse/DRILL-3567
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>         Environment: private-branch-with-multiple-partitions-enabled
>            Reporter: Victoria Markman
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Critical
>              Labels: window_function
>         Attachments: t1_parquet
>
>
> {code}
> 0: jdbc:drill:drillbit=localhost> select * from t1;
> +-------+--------+-------------+
> |  a1   |   b1   |     c1      |
> +-------+--------+-------------+
> | 1     | aaaaa  | 2015-01-01  |
> | 2     | bbbbb  | 2015-01-02  |
> | 3     | ccccc  | 2015-01-03  |
> | 4     | null   | 2015-01-04  |
> | 5     | eeeee  | 2015-01-05  |
> | 6     | fffff  | 2015-01-06  |
> | 7     | ggggg  | 2015-01-07  |
> | null  | hhhhh  | 2015-01-08  |
> | 9     | iiiii  | null        |
> | 10    | jjjjj  | 2015-01-10  |
> +-------+--------+-------------+
> 10 rows selected (0.078 seconds)
> {code}
> Wrong result, columns are projected in the wrong order:
> {code}
> 0: jdbc:drill:drillbit=localhost> select
> . . . . . . . . . . . . . . . . >         count(*) over(partition by b1 order by c1)
as count1,
> . . . . . . . . . . . . . . . . >         count(*) over(partition by a1 order by c1)
as count2,
> . . . . . . . . . . . . . . . . >         sum(a1)  over(partition by b1 order by c1)
as sum1
> . . . . . . . . . . . . . . . . > from 
> . . . . . . . . . . . . . . . . >         t1;
> +---------+---------+-------+
> | count1  | count2  | sum1  |
> +---------+---------+-------+
> | 1       | 1       | 1     |
> | 1       | 2       | 1     |
> | 1       | 3       | 1     |
> | 1       | 4       | 1     |
> | 1       | 5       | 1     |
> | 1       | 6       | 1     |
> | 1       | 7       | 1     |
> | 1       | 9       | 1     |
> | 1       | 10      | 1     |
> | 1       | null    | 1     |
> +---------+---------+-------+
> 10 rows selected (0.113 seconds)
> {code}
> Explain plan:
> {code}
> 0: jdbc:drill:drillbit=localhost> explain plan for select
> . . . . . . . . . . . . . . . . >         count(*) over(partition by b1 order by c1)
as count1,
> . . . . . . . . . . . . . . . . >         count(*) over(partition by a1 order by c1)
as count2,
> . . . . . . . . . . . . . . . . >         sum(a1)  over(partition by b1 order by c1)
as sum1
> . . . . . . . . . . . . . . . . > from 
> . . . . . . . . . . . . . . . . >         t1;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      ProjectAllowDup(count1=[$0], count2=[$1], sum1=[$2])
> 00-02        Project(w0$o0=[$4], w0$o1=[$5], w1$o0=[$6])
> 00-03          Window(window#0=[window(partition {3} order by [2] range between UNBOUNDED
PRECEDING and CURRENT ROW aggs [COUNT()])])
> 00-04            SelectionVectorRemover
> 00-05              Sort(sort0=[$3], sort1=[$2], dir0=[ASC], dir1=[ASC])
> 00-06                Window(window#0=[window(partition {1} order by [2] range between
UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT(), SUM($3)])])
> 00-07                  SelectionVectorRemover
> 00-08                    Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC])
> 00-09                      Project(T61¦¦*=[$0], b1=[$1], c1=[$2], a1=[$3])
> 00-10                        Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/Users/vmarkman/drill/testdata/subqueries/t1]], selectionRoot=file:/Users/vmarkman/drill/testdata/subqueries/t1,
numFiles=1, columns=[`*`]]])
> {code}
> If you remove frame that is not the same as other two, query works correctly:
> {code}
> 0: jdbc:drill:drillbit=localhost> select
> . . . . . . . . . . . . . . . . >         count(*) over(partition by b1 order by c1)
as count1,
> . . . . . . . . . . . . . . . . >         sum(a1)  over(partition by b1 order by c1)
as sum1
> . . . . . . . . . . . . . . . . > from
> . . . . . . . . . . . . . . . . >         t1;
> +---------+-------+
> | count1  | sum1  |
> +---------+-------+
> | 1       | 1     |
> | 1       | 2     |
> | 1       | 3     |
> | 1       | 5     |
> | 1       | 6     |
> | 1       | 7     |
> | 1       | null  |
> | 1       | 9     |
> | 1       | 10    |
> | 1       | 4     |
> +---------+-------+
> 10 rows selected (0.099 seconds)
> {code}
> and in the different order (just for fun) :
> {code}
> 0: jdbc:drill:drillbit=localhost> select
> . . . . . . . . . . . . . . . . >         sum(a1)  over(partition by b1 order by c1)
as sum1,
> . . . . . . . . . . . . . . . . >         count(*) over(partition by b1 order by c1)
as count1
> . . . . . . . . . . . . . . . . > from 
> . . . . . . . . . . . . . . . . >         t1;
> +-------+---------+
> | sum1  | count1  |
> +-------+---------+
> | 1     | 1       |
> | 2     | 1       |
> | 3     | 1       |
> | 5     | 1       |
> | 6     | 1       |
> | 7     | 1       |
> | null  | 1       |
> | 9     | 1       |
> | 10    | 1       |
> | 4     | 1       |
> +-------+---------+
> 10 rows selected (0.096 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message