drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3668) Incorrect results FIRST_VALUE function
Date Wed, 19 Aug 2015 01:55:46 GMT
Khurram Faraaz created DRILL-3668:
-------------------------------------

             Summary: Incorrect results FIRST_VALUE function
                 Key: DRILL-3668
                 URL: https://issues.apache.org/jira/browse/DRILL-3668
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.2.0
         Environment: private-branch https://github.com/adeneche/incubator-drill/tree/new-window-funcs
            Reporter: Khurram Faraaz
            Assignee: Chris Westin


Query returns incorrect results for first_value column on developers private branch.

Query results from Drill (6 rows)

{code}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY
c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER
( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 = 'e' ORDER BY tile, c1;
+----------+-------+----------+---------+-------+--------------+-------------+
|    c1    |  c2   | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
+----------+-------+----------+---------+-------+--------------+-------------+
| -1       | e     | e        | null    | 1     | e            | e           |
| 19       | null  | null     | null    | 1     | e            | null        |
| 65536    | null  | null     | null    | 1     | e            | null        |
| 15       | e     | null     | e       | 2     | e            | e           |
| 1000000  | null  | null     | null    | 2     | e            | null        |
| null     | null  | null     | null    | 3     | e            | null        |
+----------+-------+----------+---------+-------+--------------+-------------+
6 rows selected (0.269 seconds)

{code}

Query results from Postgres for same input data (2 rows returned)

{code}
postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) lead_c2,
lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER
BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e'
ORDER BY tile, c1;
 c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
----+----+---------+--------+------+-------------+------------
 -1 | e  | e       |        |    1 | e           | e
 15 | e  |         | e      |    2 | e           | e
(2 rows)

{code}

Another query that returns different results

Results returned by Drill - zero rows returned

{code}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY
c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER
( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 is null ORDER BY tile, c1;
+-----+-----+----------+---------+-------+--------------+-------------+
| c1  | c2  | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
+-----+-----+----------+---------+-------+--------------+-------------+
+-----+-----+----------+---------+-------+--------------+-------------+
No rows selected (0.279 seconds)
{code}

Results returned by Postgres for same input data, 4 rows returned

{code}
postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) lead_c2,
lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER
BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 is null
ORDER BY tile, c1;
   c1    | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
---------+----+---------+--------+------+-------------+------------
      19 |    |         |        |    1 |             | 
   65536 |    |         |        |    1 |             | 
 1000000 |    |         |        |    2 |             | 
         |    |         |        |    3 |             | 
(4 rows)
{code}

Table definition on Postgres

{code}
postgres=# \d t222
        Table "public.t222"
 Column |     Type     | Modifiers 
--------+--------------+-----------
 c1     | integer      | 
 c2     | character(1) | 
{code}

Data from the parquet file used in above queries on Drill

{code}
0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`;
+-------------+-------+
|     c1      |  c2   |
+-------------+-------+
| 1           | a     |
| 2           | b     |
| 13          | c     |
| 4           | c     |
| 5           | a     |
| 6           | c     |
| null        | d     |
| 17          | b     |
| 8           | c     |
| 9           | b     |
| 10          | d     |
| 2147483647  | d     |
| 10          | a     |
| 11          | a     |
| null        | c     |
| 11          | d     |
| 12          | c     |
| 19          | null  |
| 13          | b     |
| 14          | a     |
| 13          | c     |
| 15          | e     |
| -1          | e     |
| 0           | a     |
| 2147483647  | d     |
| null        | d     |
| 65536       | null  |
| 1000000     | null  |
| null        | null  |
| 11111       | a     |
+-------------+-------+
30 rows selected (0.145 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message