drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Girish (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3830) Query with aggregate window functions returns possibly wrong results on large scale data
Date Wed, 23 Sep 2015 21:48:04 GMT
Abhishek Girish created DRILL-3830:
--------------------------------------

             Summary: Query with aggregate window functions returns possibly wrong results
on large scale data
                 Key: DRILL-3830
                 URL: https://issues.apache.org/jira/browse/DRILL-3830
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators
    Affects Versions: 1.2.0
         Environment: 10 Performance Nodes
DRILL_MAX_DIRECT_MEMORY=100g
DRILL_INIT_HEAP="8g"
DRILL_MAX_HEAP="8g"
planner.memory.query_max_memory_per_node bumped up to 20 GB
TPC-DS SF 1000 dataset (Parquet)
            Reporter: Abhishek Girish
            Assignee: Deneche A. Hakim


Results returned by the following two queries slightly differ from those returned  by Greenplum
DB. 

{code:sql}
SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss
LIMIT 1;

SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk ORDER BY ss.ss_store_sk)
FROM store_sales ss LIMIT 2;

Drill:
9.653697131700665E9

Greenplum DB:
9.628946925860903E9

P.S. Both queries return same results
{code}

I was unable to reproduce this on smaller scale (tried SF 1). I'll attach plans from both
systems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message