Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B3D3CF93 for ; Sun, 14 Dec 2014 04:33:13 +0000 (UTC) Received: (qmail 52024 invoked by uid 500); 14 Dec 2014 04:33:13 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 51984 invoked by uid 500); 14 Dec 2014 04:33:13 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 51974 invoked by uid 99); 14 Dec 2014 04:33:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Dec 2014 04:33:13 +0000 Date: Sun, 14 Dec 2014 04:33:13 +0000 (UTC) From: "Aman Sinha (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (DRILL-1788) Conflicting column names in join MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha reassigned DRILL-1788: --------------------------------- Assignee: Aman Sinha (was: Jacques Nadeau) > Conflicting column names in join > -------------------------------- > > Key: DRILL-1788 > URL: https://issues.apache.org/jira/browse/DRILL-1788 > Project: Apache Drill > Issue Type: Bug > Reporter: Steven Phillips > Assignee: Aman Sinha > Fix For: 0.8.0 > > Attachments: 0001-DRILL-1788-Test-query-for-case-insensitive-join.-Fix.patch, 0001-Workaround-for-CALCITE-528-Convert-field-names-to-lo.patch > > > Drill doesn't support multiple columns within a batch having the same name. when doing a join where there are matching column names, the planner will insert a project to rename one of the columns to avoid this conflict. > However, it appears that there is some case-sensitive matching somewhere in the code path, because there are some cases where this rewrite does not happen: > For example, this query does do the column name change (see 01-03): > 0: jdbc:drill:> explain plan for select n3.n_name from (select n2.n_name from cp.`tpch/nation.parquet` n1, cp.`tpch/nation.parquet` n2 where n1.n_name = n2.n_name) n3 join cp.`tpch/nation.parquet` n4 on n3.n_name = n4.n_name; > {code} > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 UnionExchange > 01-01 Project(n_name=[$0]) > 01-02 HashJoin(condition=[=($0, $1)], joinType=[inner]) > 01-04 HashToRandomExchange(dist0=[[$0]]) > 02-01 Project(n_name=[$1]) > 02-02 HashJoin(condition=[=($0, $1)], joinType=[inner]) > 02-04 HashToRandomExchange(dist0=[[$0]]) > 04-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]]) > 02-03 Project(n_name0=[$0]) > 02-05 HashToRandomExchange(dist0=[[$0]]) > 05-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]]) > 01-03 Project(n_name0=[$0]) > 01-05 HashToRandomExchange(dist0=[[$0]]) > 03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]]) > {code} > But if I change the one of the letters in one of the identifiers to uppercase, the rename goes away: > {code} > 0: jdbc:drill:> explain plan for select n3.n_name from (select n2.n_name from cp.`tpch/nation.parquet` n1, cp.`tpch/nation.parquet` n2 where n1.N_name = n2.n_name) n3 join cp.`tpch/nation.parquet` n4 on n3.n_name = n4.n_name; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 UnionExchange > 01-01 Project(n_name=[$0]) > 01-02 HashJoin(condition=[=($0, $1)], joinType=[inner]) > 01-04 HashToRandomExchange(dist0=[[$0]]) > 02-01 Project(n_name=[$1]) > 02-02 HashJoin(condition=[=($0, $1)], joinType=[inner]) > 02-04 HashToRandomExchange(dist0=[[$0]]) > 04-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`N_name`]]]) > 02-03 Project(N_name0=[$0]) > 02-05 HashToRandomExchange(dist0=[[$0]]) > 05-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`N_name`]]]) > 01-03 HashToRandomExchange(dist0=[[$0]]) > 03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`N_name`]]]) > {code} > Running this query without the rewrite results in failure: > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.rangeCheck(ArrayList.java:604) ~[na:1.7.0_21] > at java.util.ArrayList.get(ArrayList.java:382) ~[na:1.7.0_21] > at org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:252) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] > at org.apache.drill.exec.record.AbstractRecordBatch.getValueAccessorById(AbstractRecordBatch.java:153) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] > at org.apache.drill.exec.test.generated.HashJoinProbeGen249.doSetup(HashJoinProbeTemplate.java:46) ~[na:na] > at org.apache.drill.exec.test.generated.HashJoinProbeGen249.setupHashJoinProbe(HashJoinProbeTemplate.java:97) ~[na:na] > at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:226) ~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT] -- This message was sent by Atlassian JIRA (v6.3.4#6332)