Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 238AC193B5 for ; Mon, 7 Mar 2016 22:56:41 +0000 (UTC) Received: (qmail 49445 invoked by uid 500); 7 Mar 2016 22:56:41 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 49413 invoked by uid 500); 7 Mar 2016 22:56:41 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 49389 invoked by uid 99); 7 Mar 2016 22:56:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2016 22:56:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B36F72C14FB for ; Mon, 7 Mar 2016 22:56:40 +0000 (UTC) Date: Mon, 7 Mar 2016 22:56:40 +0000 (UTC) From: "Victoria Markman (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (DRILL-4477) Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183749#comment-15183749 ] Victoria Markman edited comment on DRILL-4477 at 3/7/16 10:55 PM: ------------------------------------------------------------------ Same in 1.4.0: {code} 0: jdbc:drill:drillbit=localhost> select * from sys.version; +----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+ | version | commit_id | commit_message | commit_time | build_email | build_time | +----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+ | 1.4.0 | 556d9c5dbf4b0f776b57f0d39d07111ccc0d2e7b | MD-649: do the cost adjustment for $SUM0 only when LIMIT 0 is present. | 07.01.2016 @ 18:19:31 UTC | Unknown | 07.01.2016 @ 19:52:20 UTC | +----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+ 1 row selected (1.082 seconds) 0: jdbc:drill:drillbit=localhost> select * from (select t1.a, t1.b, t1.c, t2.a, t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c); +----+----+----+-----+-----+-----+ | a | b | c | a0 | b0 | c0 | +----+----+----+-----+-----+-----+ | a | b | 1 | a | b | 1 | +----+----+----+-----+-----+-----+ 1 row selected (2.005 seconds) {code} Star is not the factor here either. I get wrong result even when I explicitly name columns: {code} 0: jdbc:drill:drillbit=localhost> select a,b,c,d,e,f from (select t1.a, t1.b, t1.c, t2.a, t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c) as t(a,b,c,d,e,f); +----+----+----+----+----+----+ | a | b | c | d | e | f | +----+----+----+----+----+----+ | a | b | 1 | a | b | 1 | +----+----+----+----+----+----+ 1 row selected (0.342 seconds) {code} I went all the way back to 1.0.0 to see if it reproduces there, because I was convinced that I've seen exactly the same problem before. It turns out that it reproduces there as well. Perplexing that we did not find it until now ... was (Author: vicky): Same in 1.4.0: {code} 0: jdbc:drill:drillbit=localhost> select * from sys.version; +----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+ | version | commit_id | commit_message | commit_time | build_email | build_time | +----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+ | 1.4.0 | 556d9c5dbf4b0f776b57f0d39d07111ccc0d2e7b | MD-649: do the cost adjustment for $SUM0 only when LIMIT 0 is present. | 07.01.2016 @ 18:19:31 UTC | Unknown | 07.01.2016 @ 19:52:20 UTC | +----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+ 1 row selected (1.082 seconds) 0: jdbc:drill:drillbit=localhost> select * from (select t1.a, t1.b, t1.c, t2.a, t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c); +----+----+----+-----+-----+-----+ | a | b | c | a0 | b0 | c0 | +----+----+----+-----+-----+-----+ | a | b | 1 | a | b | 1 | +----+----+----+-----+-----+-----+ 1 row selected (2.005 seconds) {code} > Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM > ---------------------------------------------------------------------------- > > Key: DRILL-4477 > URL: https://issues.apache.org/jira/browse/DRILL-4477 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Reporter: Sean Hsuan-Yi Chu > Assignee: Sean Hsuan-Yi Chu > Priority: Blocker > Fix For: 1.6.0 > > Attachments: t1.json, t2.json > > > For example, a query > {code} > select * from (select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left outer join cp.`tpch/nation.parquet` v > on (s.name = v.name) > where s.age < 30) t > {code} > gives a plan as below: > {code} > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(name=[$0], name0=[$1], registration=[$2]) > 00-02 Project(name=[$0], name0=[$0], registration=[$3]) > 00-03 Project(name=[$2], age=[$3], name0=[$0], registration=[$1]) > 00-04 HashJoin(condition=[=($2, $0)], joinType=[right]) > 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`, `registration`]]]) > 00-05 Project(name0=[$0], age=[$1]) > 00-07 SelectionVectorRemover > 00-08 Filter(condition=[<($1, 30)]) > 00-09 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`, `age`]]]) > {code} > In the line 00-02, both name and name0 point at the same incoming column (probably due to the JOIN CONDITION). > However. the fact that these two are the JOIN condition does not make a case that they must be equal since implicit casting might be invoked to perform the JOIN condition. > Interestingly, if the SELECT * FROM wrapper is removed, this bug won't be exposed: > {code} > select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left outer join cp.`tpch/nation.parquet` v on (s.name = v.name) > where s.age < 30 > {code} > gives > {code} > 00-00 Screen > 00-01 Project(name=[$0], name0=[$1], registration=[$2]) > 00-02 Project(name=[$2], name0=[$0], registration=[$1]) > 00-03 HashJoin(condition=[=($2, $0)], joinType=[right]) > 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`, `registration`]]]) > 00-04 Project(name0=[$0]) > 00-06 Project(name=[$0]) > 00-07 SelectionVectorRemover > 00-08 Filter(condition=[<($1, 30)]) > 00-09 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`, `age`]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)