hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harish Butani (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-6955) ExprNodeColDesc isSame doesn't account for tabAlias: this affects trait Propagation in Joins
Date Mon, 28 Apr 2014 17:50:22 GMT

     [ https://issues.apache.org/jira/browse/HIVE-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harish Butani updated HIVE-6955:
--------------------------------

    Fix Version/s:     (was: 0.14.0)

> ExprNodeColDesc isSame doesn't account for tabAlias: this affects trait Propagation in
Joins
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6955
>                 URL: https://issues.apache.org/jira/browse/HIVE-6955
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Harish Butani
>            Assignee: Harish Butani
>         Attachments: HIVE-6955.1.patch
>
>
> For tpcds Q15:
> {code}
> explain
> select ca_zip, sum(cs_sales_price)
> from catalog_sales, customer, customer_address, date_dim
> where catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
>   and customer.c_current_addr_sk = customer_address.ca_address_sk
>   and (substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
>                               '85392', '85460', '80348', '81792')
>        or ca_state in ('CA','WA','GA')
>        or cs_sales_price > 500)
>   and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
>   and d_qoy = 2 and d_year = 2001
> group by ca_zip
> order by ca_zip
> limit 100;
> {code}
> The Traits setup for the Operators are:
> {code}
> FIL[23]: bucketCols=[[]],numBuckets=-1
> RS[11]: bucketCols=[[VALUE._col0]],numBuckets=-1
> JOIN[12]: bucketCols=[[_col71], [_col71]],numBuckets=-1
> FIL[13]: bucketCols=[[_col71], [_col71]],numBuckets=-1
> SEL[14]: bucketCols=[[_col71], [_col71]],numBuckets=-1
> GBY[15]: bucketCols=[[_col0]],numBuckets=-1
> RS[16]: bucketCols=[[KEY._col0]],numBuckets=-1
> GBY[17]: bucketCols=[[_col0]],numBuckets=-1
> SEL[18]: bucketCols=[[_col0]],numBuckets=-1
> LIM[21]: bucketCols=[[_col0]],numBuckets=-1
> FS[22]: bucketCols=[[_col0]],numBuckets=-1
> TS[3]: bucketCols=[[]],numBuckets=-1
> RS[5]: bucketCols=[[VALUE._col0]],numBuckets=-1
> JOIN[6]: bucketCols=[[_col3], [_col36]],numBuckets=-1
> RS[7]: bucketCols=[[VALUE._col40]],numBuckets=-1
> JOIN[9]: bucketCols=[[_col40], [_col0]],numBuckets=-1
> RS[10]: bucketCols=[[VALUE._col0]],numBuckets=-1
> TS[1]: bucketCols=[[]],numBuckets=-1
> RS[8]: bucketCols=[[VALUE._col0]],numBuckets=-1
> TS[0]: bucketCols=[[]],numBuckets=-1
> RS[4]: bucketCols=[[VALUE._col3]],numBuckets=-1
> {code}
> This is incorrect:
> Join[9] joins ca join (cs join cust). In this case both sides of join have a '_col0'
column. The reverse mapping of trait propagation relies on ExprNodeColumnDesc.isSame; since
this doesn't account for the tabAlias we end up with Join[9] being bucketed on cs_sold_date_sk;
Join[12] has the same issue, only compounds the error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message