drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4825) Wrong data with UNION ALL when querying different sub-directories under the same table
Date Fri, 05 Aug 2016 04:24:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408845#comment-15408845
] 

Jinfeng Ni commented on DRILL-4825:
-----------------------------------

The cause of this problem:
    EnumerableTableScan's digest only contains table name/rowtype.  After dir-based partition
pruning, we got two EnumerableTableScan, each has DrillTable with different file selection.
 Those two EnumerableTableScan instances have same digests. It works fine for HepPlanner,
but not for VolcanoPlanner, which will treat them as identical. That's why after VolcanoPlanner
for drill logical planning, we end up with the same TableScan. That's why we got the incorrect
plan and query result.

 

> Wrong data with UNION ALL when querying different sub-directories under the same table
> --------------------------------------------------------------------------------------
>
>                 Key: DRILL-4825
>                 URL: https://issues.apache.org/jira/browse/DRILL-4825
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.6.0, 1.7.0, 1.8.0
>            Reporter: Rahul Challapalli
>            Assignee: Jinfeng Ni
>            Priority: Critical
>             Fix For: 1.8.0
>
>         Attachments: l_3level.tgz
>
>
> git.commit.id.abbrev=0700c6b
> The below query returns wrongs results 
> {code}
> select count (*) from (
>   select l_orderkey, dir0 from l_3level t1 where t1.dir0 = 1 and t1.dir1='one' and t1.dir2
= '2015-7-12'
>   union all 
>   select l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and t2.dir1='two' and t2.dir2
= '2015-8-12') data;
> +---------+
> | EXPR$0  |
> +---------+
> | 20      |
> +---------+
> {code}
> The wrong result is evident from the output of the below queries
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select l_orderkey, dir0
from l_3level t2 where t2.dir0 = 1 and t2.dir1='two' and t2.dir2 = '2015-8-12');
> +---------+
> | EXPR$0  |
> +---------+
> | 30      |
> +---------+
> 1 row selected (0.258 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select l_orderkey, dir0
from l_3level t2 where t2.dir0 = 1 and t2.dir1='one' and t2.dir2 = '2015-7-12');
> +---------+
> | EXPR$0  |
> +---------+
> | 10      |
> +---------+
> {code}
> I attached the data set. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message