hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damien Carol (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7905) CBO: more cost model changes
Date Thu, 04 Sep 2014 07:34:51 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Damien Carol updated HIVE-7905:
-------------------------------
    Component/s: CBO

> CBO: more cost model changes
> ----------------------------
>
>                 Key: HIVE-7905
>                 URL: https://issues.apache.org/jira/browse/HIVE-7905
>             Project: Hive
>          Issue Type: Sub-task
>          Components: CBO
>            Reporter: Harish Butani
>            Assignee: Harish Butani
>         Attachments: exp-backoff-vs-log-smoothing
>
>
> 1. For composite predicates smoothen the Selectivity calculation using +exponential backoff+.
Thanks to [~ mmokhtar] for this formula.
> {quote}
> Can you change the algorithm to use exponential back-off  :
> ndv(pe0) * ndv(pe1) ^(1/2)  * ndv(pe2) ^(1/4)  * ndv(pe3) ^(1/8)
> Opposed to :
> ndv(pex)*log(ndv(pe1))*log(ndv(pe2))
> If we assume selectivity of 0.7 for each store_sales join then join selectivity can end
up being 6.24285E-05 which is too low and eventually results in an un-optimal plan.
> {quote}
> See attached picture.
> 2. In case of Fact - Dim joins on the Dim primary key we infer the Join cardinality as
a filter on the Fact table:
> {code}
> join card = rowCount(Fact table) * selectivity(dim table)
> {code}
> Whether a Column is a Key is inferred based on either:
> * table rowCount = column ndv
> * (tbd shortly) table rowCount = (maxVal - minVal)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message