drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3668) Incorrect results FIRST_VALUE function
Date Thu, 27 Aug 2015 21:54:45 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717600#comment-14717600
] 

Khurram Faraaz commented on DRILL-3668:
---------------------------------------

Verified on private branch that the below two queries give correct results. I will verify
on master once the fix is available.
commitID on private branch : ce658fe1

{code}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY
c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER
( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 = 'e' ORDER BY tile, c1;
+-----+-----+----------+---------+-------+--------------+-------------+
| c1  | c2  | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
+-----+-----+----------+---------+-------+--------------+-------------+
| -1  | e   | e        | null    | 1     | e            | e           |
| 15  | e   | null     | e       | 2     | e            | e           |
+-----+-----+----------+---------+-------+--------------+-------------+
2 rows selected (2.078 seconds)
{code}

{code}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY
c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER
( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 is null ORDER BY tile, c1;
+----------+-------+----------+---------+-------+--------------+-------------+
|    c1    |  c2   | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
+----------+-------+----------+---------+-------+--------------+-------------+
| 19       | null  | null     | null    | 1     | null         | null        |
| 65536    | null  | null     | null    | 1     | null         | null        |
| 1000000  | null  | null     | null    | 2     | null         | null        |
| null     | null  | null     | null    | 3     | null         | null        |
+----------+-------+----------+---------+-------+--------------+-------------+
4 rows selected (0.499 seconds)
{code}

> Incorrect results FIRST_VALUE function
> --------------------------------------
>
>                 Key: DRILL-3668
>                 URL: https://issues.apache.org/jira/browse/DRILL-3668
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: private-branch https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Aman Sinha
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>
> Query returns incorrect results for first_value column on developers private branch.
> Query results from Drill (6 rows)
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION
BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER
( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 = 'e' ORDER BY tile, c1;
> +----------+-------+----------+---------+-------+--------------+-------------+
> |    c1    |  c2   | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
> +----------+-------+----------+---------+-------+--------------+-------------+
> | -1       | e     | e        | null    | 1     | e            | e           |
> | 19       | null  | null     | null    | 1     | e            | null        |
> | 65536    | null  | null     | null    | 1     | e            | null        |
> | 15       | e     | null     | e       | 2     | e            | e           |
> | 1000000  | null  | null     | null    | 2     | e            | null        |
> | null     | null  | null     | null    | 3     | e            | null        |
> +----------+-------+----------+---------+-------+--------------+-------------+
> 6 rows selected (0.269 seconds)
> {code}
> Query results from Postgres for same input data (2 rows returned)
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1)
lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY
c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e'
ORDER BY tile, c1;
>  c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
> ----+----+---------+--------+------+-------------+------------
>  -1 | e  | e       |        |    1 | e           | e
>  15 | e  |         | e      |    2 | e           | e
> (2 rows)
> {code}
> Another query that returns different results
> Results returned by Drill - zero rows returned
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION
BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER
( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 is null ORDER BY tile, c1;
> +-----+-----+----------+---------+-------+--------------+-------------+
> | c1  | c2  | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
> +-----+-----+----------+---------+-------+--------------+-------------+
> +-----+-----+----------+---------+-------+--------------+-------------+
> No rows selected (0.279 seconds)
> {code}
> Results returned by Postgres for same input data, 4 rows returned
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1)
lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY
c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 is null
ORDER BY tile, c1;
>    c1    | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
> ---------+----+---------+--------+------+-------------+------------
>       19 |    |         |        |    1 |             | 
>    65536 |    |         |        |    1 |             | 
>  1000000 |    |         |        |    2 |             | 
>          |    |         |        |    3 |             | 
> (4 rows)
> {code}
> Table definition on Postgres
> {code}
> postgres=# \d t222
>         Table "public.t222"
>  Column |     Type     | Modifiers 
> --------+--------------+-----------
>  c1     | integer      | 
>  c2     | character(1) | 
> {code}
> Data from the parquet file used in above queries on Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`;
> +-------------+-------+
> |     c1      |  c2   |
> +-------------+-------+
> | 1           | a     |
> | 2           | b     |
> | 13          | c     |
> | 4           | c     |
> | 5           | a     |
> | 6           | c     |
> | null        | d     |
> | 17          | b     |
> | 8           | c     |
> | 9           | b     |
> | 10          | d     |
> | 2147483647  | d     |
> | 10          | a     |
> | 11          | a     |
> | null        | c     |
> | 11          | d     |
> | 12          | c     |
> | 19          | null  |
> | 13          | b     |
> | 14          | a     |
> | 13          | c     |
> | 15          | e     |
> | -1          | e     |
> | 0           | a     |
> | 2147483647  | d     |
> | null        | d     |
> | 65536       | null  |
> | 1000000     | null  |
> | null        | null  |
> | 11111       | a     |
> +-------------+-------+
> 30 rows selected (0.145 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message