hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
Date Tue, 07 Jan 2014 19:45:52 GMT
Gunther Hagleitner created HIVE-6157:
----------------------------------------

             Summary: Fetching column stats slower than the 101 during rush hour
                 Key: HIVE-6157
                 URL: https://issues.apache.org/jira/browse/HIVE-6157
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.13.0
            Reporter: Gunther Hagleitner


"hive.stats.fetch.column.stats" controls whether the column stats for a table are fetched
during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns)
the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag
on. 65 seconds spent fetching column stats...

The reason is probably that the APIs force you to make separate metastore calls for each column
in each partition. That's probably the first thing that has to change. The question is if
in addition to that we need to cache this in the client or store the stats as a single blob
in the database to further cut down on the time. However, the way it stands right now column
stats seem unusable.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message