drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3668) Incorrect results FIRST_VALUE function
Date Sun, 06 Sep 2015 17:20:45 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732463#comment-14732463
] 

Aman Sinha commented on DRILL-3668:
-----------------------------------

Updated patch looks ok to me since the new patch is resetting the internal batch vector locally
as part of Window function operator.  For future reference, could you add the link to the
discussion we had on the dev mailing list regarding this issue ?
+1.  



> Incorrect results FIRST_VALUE function
> --------------------------------------
>
>                 Key: DRILL-3668
>                 URL: https://issues.apache.org/jira/browse/DRILL-3668
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: private-branch https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Aman Sinha
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>
> Query returns incorrect results for first_value column on developers private branch.
> Query results from Drill (6 rows)
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION
BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER
( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 = 'e' ORDER BY tile, c1;
> +----------+-------+----------+---------+-------+--------------+-------------+
> |    c1    |  c2   | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
> +----------+-------+----------+---------+-------+--------------+-------------+
> | -1       | e     | e        | null    | 1     | e            | e           |
> | 19       | null  | null     | null    | 1     | e            | null        |
> | 65536    | null  | null     | null    | 1     | e            | null        |
> | 15       | e     | null     | e       | 2     | e            | e           |
> | 1000000  | null  | null     | null    | 2     | e            | null        |
> | null     | null  | null     | null    | 3     | e            | null        |
> +----------+-------+----------+---------+-------+--------------+-------------+
> 6 rows selected (0.269 seconds)
> {code}
> Query results from Postgres for same input data (2 rows returned)
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1)
lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY
c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e'
ORDER BY tile, c1;
>  c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
> ----+----+---------+--------+------+-------------+------------
>  -1 | e  | e       |        |    1 | e           | e
>  15 | e  |         | e      |    2 | e           | e
> (2 rows)
> {code}
> Another query that returns different results
> Results returned by Drill - zero rows returned
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION
BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER
( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 is null ORDER BY tile, c1;
> +-----+-----+----------+---------+-------+--------------+-------------+
> | c1  | c2  | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
> +-----+-----+----------+---------+-------+--------------+-------------+
> +-----+-----+----------+---------+-------+--------------+-------------+
> No rows selected (0.279 seconds)
> {code}
> Results returned by Postgres for same input data, 4 rows returned
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1)
lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY
c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 is null
ORDER BY tile, c1;
>    c1    | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
> ---------+----+---------+--------+------+-------------+------------
>       19 |    |         |        |    1 |             | 
>    65536 |    |         |        |    1 |             | 
>  1000000 |    |         |        |    2 |             | 
>          |    |         |        |    3 |             | 
> (4 rows)
> {code}
> Table definition on Postgres
> {code}
> postgres=# \d t222
>         Table "public.t222"
>  Column |     Type     | Modifiers 
> --------+--------------+-----------
>  c1     | integer      | 
>  c2     | character(1) | 
> {code}
> Data from the parquet file used in above queries on Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`;
> +-------------+-------+
> |     c1      |  c2   |
> +-------------+-------+
> | 1           | a     |
> | 2           | b     |
> | 13          | c     |
> | 4           | c     |
> | 5           | a     |
> | 6           | c     |
> | null        | d     |
> | 17          | b     |
> | 8           | c     |
> | 9           | b     |
> | 10          | d     |
> | 2147483647  | d     |
> | 10          | a     |
> | 11          | a     |
> | null        | c     |
> | 11          | d     |
> | 12          | c     |
> | 19          | null  |
> | 13          | b     |
> | 14          | a     |
> | 13          | c     |
> | 15          | e     |
> | -1          | e     |
> | 0           | a     |
> | 2147483647  | d     |
> | null        | d     |
> | 65536       | null  |
> | 1000000     | null  |
> | null        | null  |
> | 11111       | a     |
> +-------------+-------+
> 30 rows selected (0.145 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message