drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3595) Wrong results returned by query that uses LEAD window function
Date Tue, 04 Aug 2015 00:36:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652856#comment-14652856
] 

Jason Altekruse commented on DRILL-3595:
----------------------------------------

One thing we should make sure to watch out for as we generate test cases, the limit operation
should be used very sparingly. The primary reason for this is that limit is defined at the
SQL level to guarantee nothing about where in the result set you are getting the records from.
Small cases where there is no parallelization can avoid these issues as a product of how Drill
is implemented today, but this should not be relied upon to write robust tests. In reality,
operations in Drill happen on several threads and across machines, the order that records
will be returned from operations other than a global sort at the end of the query is not deterministic.

Here the window function used specifies an order by, but this does not enforce anything about
a global ordering of all of the records produced by the evaluation of the window function.
This only enforces order within each partition.

This is very likely a legitimate bug, but the better way to test it would be to read a smaller
input file, or run a validation against the entire large result set.

> Wrong results returned by query that uses LEAD window function
> --------------------------------------------------------------
>
>                 Key: DRILL-3595
>                 URL: https://issues.apache.org/jira/browse/DRILL-3595
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen, Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: private-branch-new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Deneche A. Hakim
>              Labels: window_function
>             Fix For: 1.2.0
>
>         Attachments: union_01.parquet
>
>
> Query that uses LEAD window function returns wrong results on developer's private branch
(new-window-funcs).
> Results returned by Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select lead(c1) over w from union_01 window w as (partition
by c3 order by c1) limit 10;
> +---------+
> | EXPR$0  |
> +---------+
> | 878     |
> | -150    |
> | 402     |
> | 402     |
> | 402     |
> | 402     |
> | 402     |
> | 160     |
> | 160     |
> | 160     |
> +---------+
> 10 rows selected (0.349 seconds)
> {code}
> Results returned by Postgres
> {code}
> postgres=# select lead(c1) over w from union_01 window w as (partition by c3 order by
c1) limit 10;
>  lead 
> ------
>      
>      
>   402
>   402
>   402
>   402
>   402
>      
>   160
>   160
> (10 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message