drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4590) TPC-H q17 returns wrong results when applied to views
Date Thu, 07 Apr 2016 16:25:25 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230493#comment-15230493
] 

Aman Sinha commented on DRILL-4590:
-----------------------------------

Based on an initial look, this seems to be an issue with decorrelation of the correlated subquery.
 In the query against view, the following group-by is being done on {{c_custkey}} even though
this column is not  referenced anywhere in the query:
{noformat}
00-15 HashAgg(group=[{0}])
00-17 Project(c_custkey=[$0])
{noformat}

This is wrong column to do the group-by..it should be {{c_nationkey}} which is the correlation
column. 

> TPC-H q17 returns wrong results when applied to views
> -----------------------------------------------------
>
>                 Key: DRILL-4590
>                 URL: https://issues.apache.org/jira/browse/DRILL-4590
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.6.0
>         Environment: RHEL 6.4  2.6.32-358.el6.x86_64
>            Reporter: Dechang Gu
>            Priority: Critical
>
> When run tpch queries on views created from parquet tables, query 17 returned wrong results:

> [root@ucs-node1 bugs]# /opt/mapr/drill/drill-1.6.0/bin/sqlline -u "jdbc:drill:schema=dfs.tpchViews"
-f /tmp/TPCH_17.sql 
> 1/1          select 
> sum(l.l_extendedprice) / 7.0 as avg_yearly 
> from 
> lineitem l, 
> part p 
> where 
> p.p_partkey = l.l_partkey 
> and p.p_brand = 'Brand#13' 
> and p.p_container = 'JUMBO CAN' 
> and l.l_quantity < ( 
> select 
> 0.2 * avg(l2.l_quantity) 
> from 
> lineitem l2 
> where 
> l2.l_partkey = p.p_partkey 
> );
> +---------------------+
> |     avg_yearly      |
> +---------------------+
> | 1139490.7042857148  |
> +---------------------+
> 1 row selected (20.364 seconds)
> While the same query directly on the parquet tables shows the correct results:
> [root@ucs-node1 bugs]# /opt/mapr/drill/drill-1.6.0/bin/sqlline -u "jdbc:drill:schema=dfs.parquet"
-f /tmp/17_par100.q 
> 1/1          select 
> sum(l.l_extendedprice) / 7.0 as avg_yearly 
> from 
> lineitem_par100 l, 
> part_par100 p 
> where 
> p.p_partkey = l.l_partkey 
> and p.p_brand = 'Brand#13' 
> and p.p_container = 'JUMBO CAN' 
> and l.l_quantity < ( 
> select 
> 0.2 * avg(l2.l_quantity) 
> from 
> lineitem_par100 l2 
> where 
> l2.l_partkey = p.p_partkey 
> );
> +----------------------+
> |      avg_yearly      |
> +----------------------+
> | 3.237333813714285E7  |
> +----------------------+
> 1 row selected (25.266 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message