drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4825) Wrong data with UNION ALL when querying different sub-directories under the same table
Date Wed, 03 Aug 2016 23:24:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406809#comment-15406809
] 

Aman Sinha commented on DRILL-4825:
-----------------------------------

Also, this is not specific to union-all.  I can also repro via a join query of 2 subqueries
each of which has a directory partition filter.  In this sense, it is likely related to partition
pruning.  
{noformat}
0: jdbc:drill:zk=local> select min(o_custkey) from dfs.`multilevel/parquet` where dir0
= 1996;
+---------+
| EXPR$0  |
+---------+
| 91      |
+---------+
1 row selected (0.339 seconds)
0: jdbc:drill:zk=local> select min(o_custkey) from dfs.`multilevel/parquet` where dir0
= 1994;
+---------+
| EXPR$0  |
+---------+
| 25      |
+---------+
1 row selected (0.238 seconds)

// this query should produce 0 rows but it returns 1 row
0: jdbc:drill:zk=local> select * from (select min(o_custkey) as x from dfs.`multilevel/parquet`
where dir0 = 1994) inner join (select min(o_custkey) as y from dfs.`multilevel/parquet` where
dir0 = 1996) on x = y;
+-----+-----+
|  x  |  y  |
+-----+-----+
| 25  | 25  |
+-----+-----+
1 row selected (0.995 seconds)
{noformat}

It would be useful to narrow down when this started failing.  

> Wrong data with UNION ALL when querying different sub-directories under the same table
> --------------------------------------------------------------------------------------
>
>                 Key: DRILL-4825
>                 URL: https://issues.apache.org/jira/browse/DRILL-4825
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.8.0
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: l_3level.tgz
>
>
> git.commit.id.abbrev=0700c6b
> The below query returns wrongs results 
> {code}
> select count (*) from (
>   select l_orderkey, dir0 from l_3level t1 where t1.dir0 = 1 and t1.dir1='one' and t1.dir2
= '2015-7-12'
>   union all 
>   select l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and t2.dir1='two' and t2.dir2
= '2015-8-12') data;
> +---------+
> | EXPR$0  |
> +---------+
> | 20      |
> +---------+
> {code}
> The wrong result is evident from the output of the below queries
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select l_orderkey, dir0
from l_3level t2 where t2.dir0 = 1 and t2.dir1='two' and t2.dir2 = '2015-8-12');
> +---------+
> | EXPR$0  |
> +---------+
> | 30      |
> +---------+
> 1 row selected (0.258 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select l_orderkey, dir0
from l_3level t2 where t2.dir0 = 1 and t2.dir1='one' and t2.dir2 = '2015-7-12');
> +---------+
> | EXPR$0  |
> +---------+
> | 10      |
> +---------+
> {code}
> I attached the data set. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message