drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Girish (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-3830) Query with aggregate window functions returns possibly wrong results on large scale data
Date Sat, 26 Sep 2015 00:23:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908929#comment-14908929
] 

Abhishek Girish edited comment on DRILL-3830 at 9/26/15 12:22 AM:
------------------------------------------------------------------

Drill:
{code:sql}
SELECT SUM(ss.ss_net_paid_inc_tax) as sum1 FROM store_sales ss GROUP BY ss.ss_store_sk ORDER
BY sum1 desc LIMIT 5;
+------------------------+
|          sum1          |
+------------------------+
| 1.1648665144581015E11  |
| 9.68834885966E9        |
| 9.681969937690002E9    |
| 9.677523424939997E9    |
| 9.673773185729992E9    |
+------------------------+
5 rows selected (10.477 seconds)
{code}

Greenplum:
{code:sql}
SELECT SUM(ss.ss_net_paid_inc_tax) as sum1 FROM store_sales ss GROUP BY ss.ss_store_sk ORDER
BY sum1 desc LIMIT 5;
       sum1       
------------------
 116486651445.811
 9688348859.65997
 9681969937.69002
 9677523424.93999
 9673773185.73007
(5 rows)
{code}

Another query:
{code:sql}

SELECT ss.ss_store_sk, SUM(ss.ss_net_paid_inc_tax) as sum1 FROM store_sales ss GROUP BY ss.ss_store_sk
HAVING ss_store_sk = 650 ORDER BY sum1 desc;

Drill: 9.688348859660004E9

GP:  9688348859.65997
{code}




was (Author: agirish):

Drill:
{code:sql}
SELECT SUM(ss.ss_net_paid_inc_tax) as sum1 FROM store_sales ss GROUP BY ss.ss_store_sk ORDER
BY sum1 desc LIMIT 5;
+------------------------+
|          sum1          |
+------------------------+
| 1.1648665144581015E11  |
| 9.68834885966E9        |
| 9.681969937690002E9    |
| 9.677523424939997E9    |
| 9.673773185729992E9    |
+------------------------+
5 rows selected (10.477 seconds)
{code}

Greenplum:
{code:sql}
SELECT SUM(ss.ss_net_paid_inc_tax) as sum1 FROM store_sales ss GROUP BY ss.ss_store_sk ORDER
BY sum1 desc LIMIT 5;
       sum1       
------------------
 116486651445.811
 9688348859.65997
 9681969937.69002
 9677523424.93999
 9673773185.73007
(5 rows)
{code}

Drill:
{code:sql}

SELECT ss.ss_store_sk, SUM(ss.ss_net_paid_inc_tax) as sum1 FROM store_sales ss GROUP BY ss.ss_store_sk
HAVING ss_store_sk = 650 ORDER BY sum1 desc;

Drill: 9.688348859660004E9

GP:  9688348859.65997
{code}



> Query with aggregate window functions returns possibly wrong results on large scale data
> ----------------------------------------------------------------------------------------
>
>                 Key: DRILL-3830
>                 URL: https://issues.apache.org/jira/browse/DRILL-3830
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.2.0
>         Environment: 10 Performance Nodes
> DRILL_MAX_DIRECT_MEMORY=100g
> DRILL_INIT_HEAP="8g"
> DRILL_MAX_HEAP="8g"
> planner.memory.query_max_memory_per_node bumped up to 20 GB
> TPC-DS SF 1000 dataset (Parquet)
>            Reporter: Abhishek Girish
>            Assignee: Deneche A. Hakim
>         Attachments: drill_sf1_plan.txt, gpdb_sf1000_plan.txt, gpdb_sf1_plan.txt
>
>
> Results returned by the following two queries slightly differ from those returned  by
Greenplum DB. 
> {code:sql}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales
ss LIMIT 1;
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk ORDER BY ss.ss_store_sk)
FROM store_sales ss LIMIT 2;
> Drill:
> 9.653697131700665E9
> Greenplum DB:
> 9.628946925860903E9
> P.S. Both queries return same results
> {code}
> I was unable to reproduce this on smaller scale (tried SF 1). I'll attach plans from
both systems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message