hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance
Date Fri, 19 Sep 2014 22:28:37 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141440#comment-14141440
] 

Mostafa Mokhtar commented on HIVE-8196:
---------------------------------------

[~hagleitn]

> Joining on partition columns with fetch column stats enabled results it very small CE
which negatively affects query performance 
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8196
>                 URL: https://issues.apache.org/jira/browse/HIVE-8196
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Prasanth J
>            Priority: Critical
>              Labels: performance
>             Fix For: 0.14.0
>
>
> To make the best out of dynamic partition pruning joins should be on the partitioning
columns which results in dynamically pruning the partitions from the fact table based on the
qualifying column keys from the dimension table, this type of joins negatively effects on
cardinality estimates with fetch column stats enabled.
> Currently we don't have statistics for partition columns and as a result NDV is set to
row count, doing that negatively affects the estimated join selectivity from the join.
> Workaround is to capture statistics for partition columns or use number of partitions
incase dynamic partitioning is used.
> In StatsUtils.getColStatisticsFromExpression is where count distincts gets set to row
count 
> {code}
>   if (encd.getIsPartitionColOrVirtualCol()) {
>         // vitual columns
>         colType = encd.getTypeInfo().getTypeName();
>         countDistincts = numRows;
>         oi = encd.getWritableObjectInspector();
> {code}
> Query used to repro the issue :
> {code}
> set hive.stats.fetch.column.stats=ture;
> set hive.tez.dynamic.partition.pruning=true;
> explain select d_date 
> from store_sales, date_dim 
> where 
> store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
> date_dim.d_year = 1998;
> {code}
> Plan 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message