hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-15339) Prefetch column stats for fields needed in FilterSelectivityEstimator
Date Fri, 02 Dec 2016 09:57:58 GMT
Rajesh Balamohan created HIVE-15339:
---------------------------------------

             Summary: Prefetch column stats for fields needed in FilterSelectivityEstimator
                 Key: HIVE-15339
                 URL: https://issues.apache.org/jira/browse/HIVE-15339
             Project: Hive
          Issue Type: Improvement
            Reporter: Rajesh Balamohan
            Priority: Minor



Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics from metastore
in multiple calls. For instance, in the following query, it ends up getting individual column
statistics for for flights multiple number of times.

When the table has large number of partitions, getting statistics for columns via multiple
calls can be very expensive. This would adversely impact the overall compilation time. The
following query took 14 seconds to compile.

{noformat}
SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
FROM `flights` as `flights`
JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
JOIN `airports` as `source_airport` ON (`flights`.`origin` = `source_airport`.`iata`)
JOIN `airports` as `dest_airport` ON (`flights`.`dest` = `dest_airport`.`iata`)
GROUP BY YEAR(`flights`.`dateofflight`);
{noformat}

It may be helpful to club all columns that need statistics and fetch these details in single
remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message