hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-11110) Enable HiveJoinAddNotNullRule in CBO
Date Thu, 25 Jun 2015 14:37:04 GMT
Jesus Camacho Rodriguez created HIVE-11110:
----------------------------------------------

             Summary: Enable HiveJoinAddNotNullRule in CBO
                 Key: HIVE-11110
                 URL: https://issues.apache.org/jira/browse/HIVE-11110
             Project: Hive
          Issue Type: Bug
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez


Query
{code}
select  count(*)
 from store_sales
     ,store_returns
     ,date_dim d1
     ,date_dim d2
 where d1.d_quarter_name = '2000Q1'
   and d1.d_date_sk = ss_sold_date_sk
   and ss_customer_sk = sr_customer_sk
   and ss_item_sk = sr_item_sk
   and ss_ticket_number = sr_ticket_number
   and sr_returned_date_sk = d2.d_date_sk
   and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
{code}

The store_sales table is partitioned on ss_sold_date_sk, which is also used in a join clause.
The join clause should add a filter “filterExpr: ss_sold_date_sk is not null”, which should
get pushed the MetaStore when fetching the stats. Currently this is not done in CBO planning,
which results in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in
the optimization phase. In particular, this increases the NDV for the join columns and may
result in wrong planning.

Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message