hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harish Butani (JIRA)" <>
Subject [jira] [Created] (HIVE-6955) ExprNodeColDesc isSame doesn't account for tabAlias: this affects trait Propagation in Joins
Date Tue, 22 Apr 2014 18:44:17 GMT
Harish Butani created HIVE-6955:

             Summary: ExprNodeColDesc isSame doesn't account for tabAlias: this affects trait
Propagation in Joins
                 Key: HIVE-6955
             Project: Hive
          Issue Type: Bug
            Reporter: Harish Butani

For tpcds Q15:
select ca_zip, sum(cs_sales_price)
from catalog_sales, customer, customer_address, date_dim
where catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
  and customer.c_current_addr_sk = customer_address.ca_address_sk
  and (substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
                              '85392', '85460', '80348', '81792')
       or ca_state in ('CA','WA','GA')
       or cs_sales_price > 500)
  and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
  and d_qoy = 2 and d_year = 2001
group by ca_zip
order by ca_zip
limit 100;

The Traits setup for the Operators are:
FIL[23]: bucketCols=[[]],numBuckets=-1
RS[11]: bucketCols=[[VALUE._col0]],numBuckets=-1
JOIN[12]: bucketCols=[[_col71], [_col71]],numBuckets=-1
FIL[13]: bucketCols=[[_col71], [_col71]],numBuckets=-1
SEL[14]: bucketCols=[[_col71], [_col71]],numBuckets=-1
GBY[15]: bucketCols=[[_col0]],numBuckets=-1
RS[16]: bucketCols=[[KEY._col0]],numBuckets=-1
GBY[17]: bucketCols=[[_col0]],numBuckets=-1
SEL[18]: bucketCols=[[_col0]],numBuckets=-1
LIM[21]: bucketCols=[[_col0]],numBuckets=-1
FS[22]: bucketCols=[[_col0]],numBuckets=-1
TS[3]: bucketCols=[[]],numBuckets=-1
RS[5]: bucketCols=[[VALUE._col0]],numBuckets=-1
JOIN[6]: bucketCols=[[_col3], [_col36]],numBuckets=-1
RS[7]: bucketCols=[[VALUE._col40]],numBuckets=-1
JOIN[9]: bucketCols=[[_col40], [_col0]],numBuckets=-1
RS[10]: bucketCols=[[VALUE._col0]],numBuckets=-1
TS[1]: bucketCols=[[]],numBuckets=-1
RS[8]: bucketCols=[[VALUE._col0]],numBuckets=-1
TS[0]: bucketCols=[[]],numBuckets=-1
RS[4]: bucketCols=[[VALUE._col3]],numBuckets=-1

This is incorrect:
Join[9] joins ca join (cs join cust). In this case both sides of join have a '_col0' column.
The reverse mapping of trait propagation relies on ExprNodeColumnDesc.isSame; since this doesn't
account for the tabAlias we end up with Join[9] being bucketed on cs_sold_date_sk; Join[12]
has the same issue, only compounds the error.

This message was sent by Atlassian JIRA

View raw message