Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5A5BF110C6 for ; Tue, 22 Apr 2014 18:44:30 +0000 (UTC) Received: (qmail 52512 invoked by uid 500); 22 Apr 2014 18:44:24 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 52340 invoked by uid 500); 22 Apr 2014 18:44:23 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 52082 invoked by uid 500); 22 Apr 2014 18:44:18 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 52009 invoked by uid 99); 22 Apr 2014 18:44:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Apr 2014 18:44:17 +0000 Date: Tue, 22 Apr 2014 18:44:17 +0000 (UTC) From: "Harish Butani (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-6955) ExprNodeColDesc isSame doesn't account for tabAlias: this affects trait Propagation in Joins MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Harish Butani created HIVE-6955: ----------------------------------- Summary: ExprNodeColDesc isSame doesn't account for tabAlias: this affects trait Propagation in Joins Key: HIVE-6955 URL: https://issues.apache.org/jira/browse/HIVE-6955 Project: Hive Issue Type: Bug Reporter: Harish Butani For tpcds Q15: {code} explain select ca_zip, sum(cs_sales_price) from catalog_sales, customer, customer_address, date_dim where catalog_sales.cs_bill_customer_sk = customer.c_customer_sk and customer.c_current_addr_sk = customer_address.ca_address_sk and (substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') or ca_state in ('CA','WA','GA') or cs_sales_price > 500) and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and d_qoy = 2 and d_year = 2001 group by ca_zip order by ca_zip limit 100; {code} The Traits setup for the Operators are: {code} FIL[23]: bucketCols=[[]],numBuckets=-1 RS[11]: bucketCols=[[VALUE._col0]],numBuckets=-1 JOIN[12]: bucketCols=[[_col71], [_col71]],numBuckets=-1 FIL[13]: bucketCols=[[_col71], [_col71]],numBuckets=-1 SEL[14]: bucketCols=[[_col71], [_col71]],numBuckets=-1 GBY[15]: bucketCols=[[_col0]],numBuckets=-1 RS[16]: bucketCols=[[KEY._col0]],numBuckets=-1 GBY[17]: bucketCols=[[_col0]],numBuckets=-1 SEL[18]: bucketCols=[[_col0]],numBuckets=-1 LIM[21]: bucketCols=[[_col0]],numBuckets=-1 FS[22]: bucketCols=[[_col0]],numBuckets=-1 TS[3]: bucketCols=[[]],numBuckets=-1 RS[5]: bucketCols=[[VALUE._col0]],numBuckets=-1 JOIN[6]: bucketCols=[[_col3], [_col36]],numBuckets=-1 RS[7]: bucketCols=[[VALUE._col40]],numBuckets=-1 JOIN[9]: bucketCols=[[_col40], [_col0]],numBuckets=-1 RS[10]: bucketCols=[[VALUE._col0]],numBuckets=-1 TS[1]: bucketCols=[[]],numBuckets=-1 RS[8]: bucketCols=[[VALUE._col0]],numBuckets=-1 TS[0]: bucketCols=[[]],numBuckets=-1 RS[4]: bucketCols=[[VALUE._col3]],numBuckets=-1 {code} This is incorrect: Join[9] joins ca join (cs join cust). In this case both sides of join have a '_col0' column. The reverse mapping of trait propagation relies on ExprNodeColumnDesc.isSame; since this doesn't account for the tabAlias we end up with Join[9] being bucketed on cs_sold_date_sk; Join[12] has the same issue, only compounds the error. -- This message was sent by Atlassian JIRA (v6.2#6252)