hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harish Butani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7905) CBO: more cost model changes
Date Fri, 05 Sep 2014 18:20:29 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123309#comment-14123309
] 

Harish Butani commented on HIVE-7905:
-------------------------------------

Review board link: https://reviews.apache.org/r/25179/

> CBO: more cost model changes
> ----------------------------
>
>                 Key: HIVE-7905
>                 URL: https://issues.apache.org/jira/browse/HIVE-7905
>             Project: Hive
>          Issue Type: Sub-task
>          Components: CBO
>            Reporter: Harish Butani
>            Assignee: Harish Butani
>         Attachments: HIVE-7905.2.patch, exp-backoff-vs-log-smoothing
>
>
> 1. For composite predicates smoothen the Selectivity calculation using +exponential backoff+.
Thanks to [~ mmokhtar] for this formula.
> {quote}
> Can you change the algorithm to use exponential back-off  :
> ndv(pe0) * ndv(pe1) ^(1/2)  * ndv(pe2) ^(1/4)  * ndv(pe3) ^(1/8)
> Opposed to :
> ndv(pex)*log(ndv(pe1))*log(ndv(pe2))
> If we assume selectivity of 0.7 for each store_sales join then join selectivity can end
up being 6.24285E-05 which is too low and eventually results in an un-optimal plan.
> {quote}
> See attached picture.
> 2. In case of Fact - Dim joins on the Dim primary key we infer the Join cardinality as
a filter on the Fact table:
> {code}
> join card = rowCount(Fact table) * selectivity(dim table)
> {code}
> Whether a Column is a Key is inferred based on either:
> * table rowCount = column ndv
> * (tbd shortly) table rowCount = (maxVal - minVal)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message