drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4802) NULLS are not first when NULLS FIRST is used with ORDER BY in window definition
Date Mon, 25 Jul 2016 09:07:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391577#comment-15391577
] 

Khurram Faraaz commented on DRILL-4802:
---------------------------------------

I did some more investigation and here is what I have found.
While we do have tests that cover nulls first in our window function tests, we do not have
a test similar to the one 
shown in case (ii) below. So this seems to be a general window functions issue and is not
specific to nested aggregates.

Schema details of parquet file used in below tests

{noformat}
[root@centos-01 parquet-tools]# ./parquet-schema ~/tblWnulls.parquet
message root {
  optional int32 c1;
  optional binary c2 (UTF8);
}
{noformat}

case (i) Order by c1 nulls first, returns correct results. Note that nulls are first in column
c1 for each group in c2.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select AVG(c1) OVER(partition by c2 order by c1 nulls first),
c1, c2 FROM `tblWnulls.parquet`;
+--------------------+-------------+-------+
|       EXPR$0       |     c1      |  c2   |
+--------------------+-------------+-------+
| 0.0                | 0           | a     |
| 0.5                | 1           | a     |
| 2.0                | 5           | a     |
| 4.0                | 10          | a     |
| 5.4                | 11          | a     |
| 6.833333333333333  | 14          | a     |
| 1593.142857142857  | 11111       | a     |
| 2.0                | 2           | b     |
| 5.5                | 9           | b     |
| 8.0                | 13          | b     |
| 10.25              | 17          | b     |
| null               | null        | c     |
| 4.0                | 4           | c     |
| 5.0                | 6           | c     |
| 6.0                | 8           | c     |
| 7.5                | 12          | c     |
| 9.333333333333334  | 13          | c     |
| 9.333333333333334  | 13          | c     |
| null               | null        | d     |
| null               | null        | d     |
| 10.0               | 10          | d     |
| 10.5               | 11          | d     |
| 1.07374182875E9    | 2147483647  | d     |
| 1.07374182875E9    | 2147483647  | d     |
| -1.0               | -1          | e     |
| 7.0                | 15          | e     |
| null               | null        | null  |
| 19.0               | 19          | null  |
| 32777.5            | 65536       | null  |
| 355185.0           | 1000000     | null  |
+--------------------+-------------+-------+
30 rows selected (0.145 seconds)
{noformat}

case (2) order by c2 nulls first, does not return correct results. Note that nulls are NOT
first in column c1 for each group in c2.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select AVG(c1) OVER(partition by c2 order by c2 nulls first),
c1, c2 FROM `tblWnulls.parquet`;
+--------------------+-------------+-------+
|       EXPR$0       |     c1      |  c2   |
+--------------------+-------------+-------+
| 1593.142857142857  | 11111       | a     |
| 1593.142857142857  | 5           | a     |
| 1593.142857142857  | 10          | a     |
| 1593.142857142857  | 11          | a     |
| 1593.142857142857  | 1           | a     |
| 1593.142857142857  | 14          | a     |
| 1593.142857142857  | 0           | a     |
| 10.25              | 17          | b     |
| 10.25              | 9           | b     |
| 10.25              | 13          | b     |
| 10.25              | 2           | b     |
| 9.333333333333334  | 6           | c     |
| 9.333333333333334  | 13          | c     |
| 9.333333333333334  | 8           | c     |
| 9.333333333333334  | null        | c     |
| 9.333333333333334  | 4           | c     |
| 9.333333333333334  | 12          | c     |
| 9.333333333333334  | 13          | c     |
| 1.07374182875E9    | 2147483647  | d     |
| 1.07374182875E9    | null        | d     |
| 1.07374182875E9    | 11          | d     |
| 1.07374182875E9    | null        | d     |
| 1.07374182875E9    | 2147483647  | d     |
| 1.07374182875E9    | 10          | d     |
| 7.0                | 15          | e     |
| 7.0                | -1          | e     |
| 355185.0           | 65536       | null  |
| 355185.0           | 19          | null  |
| 355185.0           | null        | null  |
| 355185.0           | 1000000     | null  |
+--------------------+-------------+-------+
30 rows selected (0.195 seconds)
{noformat}


> NULLS are not first when NULLS FIRST is used with ORDER BY in window definition
> -------------------------------------------------------------------------------
>
>                 Key: DRILL-4802
>                 URL: https://issues.apache.org/jira/browse/DRILL-4802
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.8.0
>         Environment: 4 node CentOS cluster
>            Reporter: Khurram Faraaz
>            Assignee: Khurram Faraaz
>
> NULLS FIRST is not honored when used with ORDER BY inside window definition. This in
a wrong results issue.
> MapR Drill 1.8.0 commit ID : 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select c2, AVG(SUM(c1)) OVER(partition by c2 order by
c2 nulls first) FROM `tblWnulls.parquet` group by c2;
> +-------+----------------+
> |  c2   |     EXPR$1     |
> +-------+----------------+
> | a     | 11152.0        |
> | b     | 41.0           |
> | c     | 56.0           |
> | d     | 4.294967315E9  |
> | e     | 14.0           |
> | null  | 1065555.0      |
> +-------+----------------+
> 6 rows selected (0.227 seconds)
> {noformat}
> {noformat}
> postgres=# select c2, AVG(SUM(c1)) OVER(partition by c2 order by c2 nulls first) FROM
t222 group by c2;
>  c2 |          avg           
> ----+------------------------
>     |   1065555.000000000000
>  a  | 11152.0000000000000000
>  b  |    41.0000000000000000
>  c  |    56.0000000000000000
>  d  |    4294967315.00000000
>  e  |    14.0000000000000000
> (6 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message