hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (Jira)" <j...@apache.org>
Subject [jira] [Created] (HIVE-23788) FilterStatsRule misestimate causes hashtable computation to rehash often
Date Wed, 01 Jul 2020 07:23:00 GMT
Rajesh Balamohan created HIVE-23788:
---------------------------------------

             Summary: FilterStatsRule misestimate causes hashtable computation to rehash often
                 Key: HIVE-23788
                 URL: https://issues.apache.org/jira/browse/HIVE-23788
             Project: Hive
          Issue Type: Improvement
            Reporter: Rajesh Balamohan


Depending on available statistics, FilterStatsRule estimates the rows as numRows/3 at times.
This causes, lower keyCount to be projected for hashtable computation causing rehashing often.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L952]

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L1192]

E.g TPCDS Q74 @ 10TB. But as part of evaluating "t_s_firstyear.year_total > 0, t_w_secyear.year_total
/ t_w_firstyear.year_total , t_s_secyear.year_total / t_s_firstyear.year_total " conditions,
it projects 1/3rd of the rows causing rehashing of hashtable in downstream vertex.

May have to check whether stats can be projected for these columns correctly.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message