drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-3680) window function query returns Incorrect results
Date Thu, 10 Sep 2015 23:54:47 GMT

     [ https://issues.apache.org/jira/browse/DRILL-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jinfeng Ni updated DRILL-3680:
------------------------------
    Assignee: Aman Sinha  (was: Jinfeng Ni)

> window function query returns Incorrect results 
> ------------------------------------------------
>
>                 Key: DRILL-3680
>                 URL: https://issues.apache.org/jira/browse/DRILL-3680
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: private-branch https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Aman Sinha
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>         Attachments: 0001-DRILL-3680-Fix-incorrect-query-result-or-IOBE-when-t.patch,
0001-DRILL-3680-Fix-incorrect-query-result-or-IOBE-when-t.patch.2, tblWnulls.parquet
>
>
> Query plan from Drill for the query that returns wrong results
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select c1 , c2 , lead(c2) OVER ( PARTITION
BY c2 ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER BY c1)
FROM `tblWnulls.parquet`);
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(c1=[$0], c2=[$1], lead_c2=[$2])
> 00-02        Project(c1=[$0], c2=[$1], lead_c2=[$2])
> 00-03          Project(c1=[$0], c2=[$1], $2=[$3])
> 00-04            Window(window#0=[window(partition {1} order by [0] range between UNBOUNDED
PRECEDING and CURRENT ROW aggs [LEAD($1)])])
> 00-05              Window(window#0=[window(partition {1} order by [0] range between UNBOUNDED
PRECEDING and CURRENT ROW aggs [NTILE($2)])])
> 00-06                SelectionVectorRemover
> 00-07                  Sort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
> 00-08                    Project(c1=[$1], c2=[$0])
> 00-09                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///tmp/tblWnulls.parquet]], selectionRoot=maprfs:/tmp/tblWnulls.parquet, numFiles=1,
columns=[`c1`, `c2`]]])
> {code}
> Results returned by Drill.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select c1 , c2 , lead(c2) OVER ( PARTITION BY c2 ORDER
BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER BY c1) FROM `tblWnulls.parquet`);
> +-------------+-------+----------+
> |     c1      |  c2   | lead_c2  |
> +-------------+-------+----------+
> | 0           | a     | null     |
> | 1           | a     | null     |
> | 5           | a     | null     |
> | 10          | a     | null     |
> | 11          | a     | null     |
> | 14          | a     | null     |
> | 11111       | a     | null     |
> | 2           | b     | null     |
> | 9           | b     | null     |
> | 13          | b     | null     |
> | 17          | b     | null     |
> | 4           | c     | null     |
> | 6           | c     | null     |
> | 8           | c     | null     |
> | 12          | c     | null     |
> | 13          | c     | null     |
> | 13          | c     | null     |
> | null        | c     | null     |
> | 10          | d     | null     |
> | 11          | d     | null     |
> | 2147483647  | d     | null     |
> | 2147483647  | d     | null     |
> | null        | d     | null     |
> | null        | d     | null     |
> | -1          | e     | null     |
> | 15          | e     | null     |
> | 19          | null  | null     |
> | 65536       | null  | null     |
> | 1000000     | null  | null     |
> | null        | null  | null     |
> +-------------+-------+----------+
> 30 rows selected (0.339 seconds)
> {code}
> Results returned by Postgres
> {code}
> postgres=# select c1 , c2 , lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) lead_c2 FROM
(SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER BY c1) FROM t222) sub_query;
>      c1     | c2 | lead_c2 
> ------------+----+---------
>           0 | a  | a
>           1 | a  | a
>           5 | a  | a
>          10 | a  | a
>          11 | a  | a
>          14 | a  | a
>       11111 | a  | 
>           2 | b  | b
>           9 | b  | b
>          13 | b  | b
>          17 | b  | 
>           4 | c  | c
>           6 | c  | c
>           8 | c  | c
>          12 | c  | c
>          13 | c  | c
>          13 | c  | c
>             | c  | 
>          10 | d  | d
>          11 | d  | d
>  2147483647 | d  | d
>  2147483647 | d  | d
>             | d  | d
>             | d  | 
>          -1 | e  | e
>          15 | e  | 
>          19 |    | 
>       65536 |    | 
>     1000000 |    | 
>             |    | 
> (30 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message