drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-4477) Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM
Date Mon, 07 Mar 2016 22:56:40 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183749#comment-15183749
] 

Victoria Markman edited comment on DRILL-4477 at 3/7/16 10:55 PM:
------------------------------------------------------------------

Same in 1.4.0:

{code}
0: jdbc:drill:drillbit=localhost> select * from sys.version;
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version  |                 commit_id                 |                             commit_message
                             |        commit_time         | build_email  |         build_time
        |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| 1.4.0    | 556d9c5dbf4b0f776b57f0d39d07111ccc0d2e7b  | MD-649: do the cost adjustment for
$SUM0 only when LIMIT 0 is present.  | 07.01.2016 @ 18:19:31 UTC  | Unknown      | 07.01.2016
@ 19:52:20 UTC  |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
1 row selected (1.082 seconds)

0: jdbc:drill:drillbit=localhost> select * from (select t1.a, t1.b, t1.c, t2.a, t2.b, t2.c
from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c);
+----+----+----+-----+-----+-----+
| a  | b  | c  | a0  | b0  | c0  |
+----+----+----+-----+-----+-----+
| a  | b  | 1  | a   | b   | 1   |
+----+----+----+-----+-----+-----+
1 row selected (2.005 seconds)
{code}

Star is not the factor here either. I get wrong result even when I explicitly name columns:
{code}
0: jdbc:drill:drillbit=localhost> select a,b,c,d,e,f from (select t1.a, t1.b, t1.c, t2.a,
t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c) as t(a,b,c,d,e,f);
+----+----+----+----+----+----+
| a  | b  | c  | d  | e  | f  |
+----+----+----+----+----+----+
| a  | b  | 1  | a  | b  | 1  |
+----+----+----+----+----+----+
1 row selected (0.342 seconds)
{code}

I went all the way back to 1.0.0 to see if it reproduces there, because I was convinced that
I've seen exactly the same problem before. It turns out that it reproduces there as well.
Perplexing that we did not find it until now ... 


was (Author: vicky):
Same in 1.4.0:

{code}
0: jdbc:drill:drillbit=localhost> select * from sys.version;
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version  |                 commit_id                 |                             commit_message
                             |        commit_time         | build_email  |         build_time
        |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| 1.4.0    | 556d9c5dbf4b0f776b57f0d39d07111ccc0d2e7b  | MD-649: do the cost adjustment for
$SUM0 only when LIMIT 0 is present.  | 07.01.2016 @ 18:19:31 UTC  | Unknown      | 07.01.2016
@ 19:52:20 UTC  |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
1 row selected (1.082 seconds)

0: jdbc:drill:drillbit=localhost> select * from (select t1.a, t1.b, t1.c, t2.a, t2.b, t2.c
from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c);
+----+----+----+-----+-----+-----+
| a  | b  | c  | a0  | b0  | c0  |
+----+----+----+-----+-----+-----+
| a  | b  | 1  | a   | b   | 1   |
+----+----+----+-----+-----+-----+
1 row selected (2.005 seconds)
{code}

> Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-4477
>                 URL: https://issues.apache.org/jira/browse/DRILL-4477
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Sean Hsuan-Yi Chu
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: t1.json, t2.json
>
>
> For example, a query  
> {code}
> select * from (select s.name, v.name, v.registration from cp.`tpch/region.parquet` s
left outer join cp.`tpch/nation.parquet` v
> on (s.name = v.name) 
> where s.age < 30) t 
> {code}
> gives a plan as below:
> {code}
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(name=[$0], name0=[$1], registration=[$2])
> 00-02        Project(name=[$0], name0=[$0], registration=[$3])
> 00-03          Project(name=[$2], age=[$3], name0=[$0], registration=[$1])
> 00-04            HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-06              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`,
`registration`]]])
> 00-05              Project(name0=[$0], age=[$1])
> 00-07                SelectionVectorRemover
> 00-08                  Filter(condition=[<($1, 30)])
> 00-09                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1,
usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}
> In the line 00-02, both name and name0 point at the same incoming column (probably due
to the JOIN CONDITION). 
> However. the fact that these two are the JOIN condition does not make a case that they
must be equal since implicit casting might be invoked to perform the JOIN condition.
> Interestingly, if the SELECT * FROM wrapper is removed, this bug won't be exposed: 
> {code}
> select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left outer join
cp.`tpch/nation.parquet` v on (s.name = v.name) 
> where s.age < 30
> {code}
> gives 
> {code}
> 00-00    Screen
> 00-01      Project(name=[$0], name0=[$1], registration=[$2])
> 00-02        Project(name=[$2], name0=[$0], registration=[$1])
> 00-03          HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-05            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`,
`registration`]]])
> 00-04            Project(name0=[$0])
> 00-06              Project(name=[$0])
> 00-07                SelectionVectorRemover
> 00-08                  Filter(condition=[<($1, 30)])
> 00-09                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1,
usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message