drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
Date Sat, 14 Mar 2015 00:37:38 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361405#comment-14361405
] 

Aman Sinha commented on DRILL-2380:
-----------------------------------

Actually, the simplified query is doing both sum and group-by on the same column.. which is
fine but it does not match the intent of the original TPCDS query.  The simplified query would
better be written as: 
{code}
SELECT y, sum(x)
FROM
(SELECT ss_ext_sales_price x, ss_item_sk y
FROM  store_sales
 GROUP BY ss_item_sk, ss_ext_sales_price
UNION ALL
SELECT cs_ext_sales_price x, cs_item_sk y
FROM catalog_sales
GROUP BY cs_item_sk, cs_ext_sales_price) tmp
GROUP BY y
ORDER BY y
LIMIT 10;
{code}

> TPC-DS Query 33 and simplified variants return wrong results
> ------------------------------------------------------------
>
>                 Key: DRILL-2380
>                 URL: https://issues.apache.org/jira/browse/DRILL-2380
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Abhishek Girish
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Critical
>             Fix For: 0.9.0
>
>
> TPC-DS query 33 returns wrong results. 
> {code:sql}
> WITH ss 
>      AS (SELECT i_manufact_id, 
>                 Sum(ss_ext_sales_price) total_sales 
>          FROM   store_sales, 
>                 date_dim, 
>                 customer_address, 
>                 item 
>          WHERE  i_manufact_id IN (SELECT i_manufact_id 
>                                   FROM   item 
>                                   WHERE  i_category IN ( 'Books' )) 
>                 AND ss_item_sk = i_item_sk 
>                 AND ss_sold_date_sk = d_date_sk 
>                 AND d_year = 1999 
>                 AND d_moy = 3 
>                 AND ss_addr_sk = ca_address_sk 
>                 AND ca_gmt_offset = -5 
>          GROUP  BY i_manufact_id), 
>      cs 
>      AS (SELECT i_manufact_id, 
>                 Sum(cs_ext_sales_price) total_sales 
>          FROM   catalog_sales, 
>                 date_dim, 
>                 customer_address, 
>                 item 
>          WHERE  i_manufact_id IN (SELECT i_manufact_id 
>                                   FROM   item 
>                                   WHERE  i_category IN ( 'Books' )) 
>                 AND cs_item_sk = i_item_sk 
>                 AND cs_sold_date_sk = d_date_sk 
>                 AND d_year = 1999 
>                 AND d_moy = 3 
>                 AND cs_bill_addr_sk = ca_address_sk 
>                 AND ca_gmt_offset = -5 
>          GROUP  BY i_manufact_id), 
>      ws 
>      AS (SELECT i_manufact_id, 
>                 Sum(ws_ext_sales_price) total_sales 
>          FROM   web_sales, 
>                 date_dim, 
>                 customer_address, 
>                 item 
>          WHERE  i_manufact_id IN (SELECT i_manufact_id 
>                                   FROM   item 
>                                   WHERE  i_category IN ( 'Books' )) 
>                 AND ws_item_sk = i_item_sk 
>                 AND ws_sold_date_sk = d_date_sk 
>                 AND d_year = 1999 
>                 AND d_moy = 3 
>                 AND ws_bill_addr_sk = ca_address_sk 
>                 AND ca_gmt_offset = -5 
>          GROUP  BY i_manufact_id) 
> SELECT i_manufact_id, 
>                Sum(total_sales) total_sales 
> FROM   (SELECT i_manufact_id, total_sales 
>         FROM   ss 
>         UNION ALL 
>         SELECT i_manufact_id, total_sales
>         FROM   cs 
>         UNION ALL 
>         SELECT i_manufact_id, total_sales
>         FROM   ws) tmp1 
> GROUP  BY i_manufact_id 
> ORDER  BY total_sales
> LIMIT 10;
> Drill Results:
> +---------------+-------------+
> | i_manufact_id | total_sales |
> +---------------+-------------+
> | 440           | 0.12        |
> | 434           | 13.16       |
> | 415           | 14.04       |
> | 449           | 15.63       |
> | 563           | 31.46       |
> | 357           | 49.50       |
> | 624           | 67.94       |
> | 192           | 74.40       |
> | 137           | 83.42       |
> | 240           | 85.26       |
> +---------------+-------------+
> 10 rows selected (7.57 seconds)
> Postgres Results:
>  i_manufact_id | total_sales 
> ---------------+-------------
>            930 |        1.18
>            818 |       41.86
>            913 |      141.90
>            784 |      184.90
>            488 |      275.08
>            993 |      301.60
>            700 |      340.52
>            895 |      802.30
>            766 |      839.76
>            858 |      859.18
> (10 rows)
> {code}
> The following simplified variants also return wrong results:
> {code:sql}
> SELECT sum(x)
> FROM
> (SELECT ss_ext_sales_price x, ss_item_sk
> FROM  store_sales
>  GROUP BY ss_item_sk, ss_ext_sales_price
> UNION ALL
> SELECT cs_ext_sales_price x, cs_item_sk
> FROM catalog_sales
> GROUP BY cs_item_sk, cs_ext_sales_price) tmp
> GROUP BY x
> LIMIT 10;
> Drill Results:
> +------------+
> |   EXPR$0   |
> +------------+
> | 14141.40   |
> | 28060.00   |
> | 30912.70   |
> | 43706.88   |
> | 38267.64   |
> | 10173.00   |
> | 37829.25   |
> | 5349.50    |
> | 107515.80  |
> | 4440.84    |
> +------------+
> 10 rows selected (14.435 seconds)
> Postgres Results:
>    sum    
> ----------   
>  45234.00
>   5735.31
>   2275.60
>   6921.32
>   2590.46
>   6615.09
>  14080.77
>  24819.76
>  25127.20
> (10 rows)
> SELECT sum(x)
> FROM
> (SELECT sum(ss_ext_sales_price) x, ss_item_sk
> FROM  store_sales
>  GROUP BY ss_item_sk
> UNION ALL
> SELECT sum(cs_ext_sales_price) x, cs_item_sk
> FROM catalog_sales
> GROUP BY cs_item_sk) tmp
> GROUP BY x
> LIMIT 10;
> Drill Results:
> +------------+
> |   EXPR$0   |
> +------------+
> | 211411.58  |
> | 347027.93  |
> | 534760.93  |
> | 203028.28  |
> | 500939.61  |
> | 248226.81  |
> | 242664.29  |
> | 597659.03  |
> | 258909.73  |
> | 223624.06  |
> +------------+
> 10 rows selected (5.245 seconds)
> Postgres Results:
>     sum    
> -----------
>  252711.42
>  173571.97
>  206191.60
>  249793.96
>  170825.75
>  127718.29
>  220887.50
>  119390.44
>  217495.66
>  284348.93
> (10 rows)
> SELECT x
> FROM
> (SELECT ss_ext_sales_price x, ss_item_sk
> FROM  store_sales
>  GROUP BY ss_item_sk, ss_ext_sales_price
> UNION ALL
> SELECT cs_ext_sales_price x, cs_item_sk
> FROM catalog_sales
> GROUP BY cs_item_sk, cs_ext_sales_price) tmp
> GROUP BY x
> LIMIT 10;
> Drill Results:
> +------------+
> |     x      |
> +------------+
> | 271.95     |
> | 561.20     |
> | 391.30     |
> | 1821.12    |
> | 2125.98    |
> | 1695.50    |
> | 1513.17    |
> | 411.50     |
> | 4674.60    |
> | 193.08     |
> +------------+
> 10 rows selected (9.518 seconds)
> Postgres Results:
>     x    
> ---------
>  9046.80
>  5735.31
>   568.90
>  3460.66
>  1295.23
>  6615.09
>  4693.59
>  6204.94
>  6281.80
> (10 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message