drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-4453) Difference in results over char data, window function query
Date Mon, 07 Mar 2016 11:03:40 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Deneche A. Hakim updated DRILL-4453:
------------------------------------
    Description: 
Window function query with frame clause returns results that are different from those returned
by same query on Postgres 9.3 of same data.

Note that the two tables have same number of nulls in both Drill and Postgres.

The length of the result returned by MIN function is different on Postgres 9.3 vs Drill 1.6.0
{noformat}
Drill 1.6.0 => returns 1 as length.

0: jdbc:drill:schema=dfs.tmp> select length(min(c4)) from dfs.tmp.`t_alltype`;
+---------+
| EXPR$0  |
+---------+
| 1       |
+---------+
1 row selected (0.282 seconds)
{noformat}

Postgress 9.3 returns 0 as length.
{noformat}
postgres=# select length(min(c4)) from t_alltype;
 length
--------
      0
(1 row)
{noformat}

{noformat}
postgres=# \d t_alltype
             Table "public.t_alltype"
 Column |            Type             | Modifiers
--------+-----------------------------+-----------
 c1     | integer                     |
 c2     | integer                     |
 c3     | bigint                      |
 c4     | character(256)              |
 c5     | character varying(256)      |
 c6     | timestamp without time zone |
 c7     | date                        |
 c8     | boolean                     |
 c9     | double precision            |
postgres=# select c4 from t_alltype where c4 is null;
 c4
----
(3 rows)
{noformat}

{noformat}
postgres=# SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) FROM t_alltype;
                                                                                         
                                     min
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 gwfrW
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ...
 ...
 
 ApKK
 ApKK















(145 rows)
{noformat}

Parquet schema details

{noformat}
[root@centos-01 parquet-tools]# ./parquet-schema ./Datasources/window_functions/t_alltype.parquet
message root {
  optional int32 c1;
  optional int32 c2;
  optional int64 c3;
  optional binary c4 (UTF8);
  optional binary c5 (UTF8);
  optional int64 c6 (TIMESTAMP_MILLIS);
  optional int32 c7 (DATE);
  optional boolean c8;
  optional double c9;
}
{noformat}

On Drill 1.6.0 

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) FROM dfs.tmp.`t_alltype`;
+--------------------------------+
|             EXPR$0             |
+--------------------------------+
| gwfrW                          |
| ZAFOcferhjkcl                  |
| ZAFOcferhjkcl                  |
| ZAFOcferhjkcl                  |
| ZAFOcferhjkcl                  |
...
...
| ApKK                         |
| ApKK                         |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
| null                         |
| null                         |
|                              |
|                              |
|                              |
+------------------------------+
145 rows selected (0.409 seconds)
{noformat}

  was:
Window function query with frame clause returns results that are different from those returned
by same query on Postgres 9.3 of same data.

Note that the two tables have same number of nulls in both Drill and Postgres.

The length of the result returned by MIN function is different on Postgres 9.3 vs Drill 1.6.0
{noformat}
Drill 1.6.0 => returns 1 as length.

0: jdbc:drill:schema=dfs.tmp> select length(min(c4)) from dfs.tmp.`t_alltype`;
+---------+
| EXPR$0  |
+---------+
| 1       |
+---------+
1 row selected (0.282 seconds)

Postgress 9.3 returns 0 as lenght.

postgres=# select length(min(c4)) from t_alltype;
 length
--------
      0
(1 row)
{noformat}

{noformat}
postgres=# \d t_alltype
             Table "public.t_alltype"
 Column |            Type             | Modifiers
--------+-----------------------------+-----------
 c1     | integer                     |
 c2     | integer                     |
 c3     | bigint                      |
 c4     | character(256)              |
 c5     | character varying(256)      |
 c6     | timestamp without time zone |
 c7     | date                        |
 c8     | boolean                     |
 c9     | double precision            |
postgres=# select c4 from t_alltype where c4 is null;
 c4
----



(3 rows)

{noformat}
postgres=# SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) FROM t_alltype;
                                                                                         
                                     min
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 gwfrW
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ...
 ...
 
 ApKK
 ApKK















(145 rows)
{noformat}

Parquet schema details

{noformat}
[root@centos-01 parquet-tools]# ./parquet-schema ./Datasources/window_functions/t_alltype.parquet
message root {
  optional int32 c1;
  optional int32 c2;
  optional int64 c3;
  optional binary c4 (UTF8);
  optional binary c5 (UTF8);
  optional int64 c6 (TIMESTAMP_MILLIS);
  optional int32 c7 (DATE);
  optional boolean c8;
  optional double c9;
}
{noformat}

On Drill 1.6.0 

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) FROM dfs.tmp.`t_alltype`;
+--------------------------------+
|             EXPR$0             |
+--------------------------------+
| gwfrW                          |
| ZAFOcferhjkcl                  |
| ZAFOcferhjkcl                  |
| ZAFOcferhjkcl                  |
| ZAFOcferhjkcl                  |
...
...
| ApKK                         |
| ApKK                         |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|                              |
| null                         |
| null                         |
|                              |
|                              |
|                              |
+------------------------------+
145 rows selected (0.409 seconds)
{noformat}


> Difference in results over char data, window function query
> -----------------------------------------------------------
>
>                 Key: DRILL-4453
>                 URL: https://issues.apache.org/jira/browse/DRILL-4453
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.6.0
>         Environment: 4 node cluster
>            Reporter: Khurram Faraaz
>              Labels: window_function
>         Attachments: t_alltype.csv, t_alltype.parquet
>
>
> Window function query with frame clause returns results that are different from those
returned by same query on Postgres 9.3 of same data.
> Note that the two tables have same number of nulls in both Drill and Postgres.
> The length of the result returned by MIN function is different on Postgres 9.3 vs Drill
1.6.0
> {noformat}
> Drill 1.6.0 => returns 1 as length.
> 0: jdbc:drill:schema=dfs.tmp> select length(min(c4)) from dfs.tmp.`t_alltype`;
> +---------+
> | EXPR$0  |
> +---------+
> | 1       |
> +---------+
> 1 row selected (0.282 seconds)
> {noformat}
> Postgress 9.3 returns 0 as length.
> {noformat}
> postgres=# select length(min(c4)) from t_alltype;
>  length
> --------
>       0
> (1 row)
> {noformat}
> {noformat}
> postgres=# \d t_alltype
>              Table "public.t_alltype"
>  Column |            Type             | Modifiers
> --------+-----------------------------+-----------
>  c1     | integer                     |
>  c2     | integer                     |
>  c3     | bigint                      |
>  c4     | character(256)              |
>  c5     | character varying(256)      |
>  c6     | timestamp without time zone |
>  c7     | date                        |
>  c8     | boolean                     |
>  c9     | double precision            |
> postgres=# select c4 from t_alltype where c4 is null;
>  c4
> ----
> (3 rows)
> {noformat}
> {noformat}
> postgres=# SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) FROM t_alltype;
>                                                                                     
                                          min
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  gwfrW
>  ZAFOcferhjkcl
>  ZAFOcferhjkcl
>  ZAFOcferhjkcl
>  ZAFOcferhjkcl
>  ...
>  ...
>  
>  ApKK
>  ApKK
> (145 rows)
> {noformat}
> Parquet schema details
> {noformat}
> [root@centos-01 parquet-tools]# ./parquet-schema ./Datasources/window_functions/t_alltype.parquet
> message root {
>   optional int32 c1;
>   optional int32 c2;
>   optional int64 c3;
>   optional binary c4 (UTF8);
>   optional binary c5 (UTF8);
>   optional int64 c6 (TIMESTAMP_MILLIS);
>   optional int32 c7 (DATE);
>   optional boolean c8;
>   optional double c9;
> }
> {noformat}
> On Drill 1.6.0 
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM dfs.tmp.`t_alltype`;
> +--------------------------------+
> |             EXPR$0             |
> +--------------------------------+
> | gwfrW                          |
> | ZAFOcferhjkcl                  |
> | ZAFOcferhjkcl                  |
> | ZAFOcferhjkcl                  |
> | ZAFOcferhjkcl                  |
> ...
> ...
> | ApKK                         |
> | ApKK                         |
> |                              |
> |                              |
> |                              |
> |                              |
> |                              |
> |                              |
> |                              |
> |                              |
> |                              |
> |                              |
> | null                         |
> | null                         |
> |                              |
> |                              |
> |                              |
> +------------------------------+
> 145 rows selected (0.409 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message