hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <>
Subject [jira] [Created] (HIVE-5369) Annotate hive operator tree with statistics from metastore
Date Thu, 26 Sep 2013 08:40:02 GMT
Prasanth J created HIVE-5369:

             Summary: Annotate hive operator tree with statistics from metastore
                 Key: HIVE-5369
             Project: Hive
          Issue Type: New Feature
          Components: Query Processor, Statistics
    Affects Versions: 0.13.0
            Reporter: Prasanth J
            Assignee: Prasanth J
             Fix For: 0.13.0

Currently the statistics gathered at table/partition level and column level are not used during
query planning stage. Statistics at table/partition and column level can be used for optimizing
the query plans. Basic statistics like uncompressed data size can be used for better reducer
estimation. Other statistics like number of rows, distinct values of columns, average length
of columns etc. can be used by Cost Based Optimizer (CBO) for making better query plan selection.
As a first step in improving query planning the statistics that are available in the metastore
should be attached to hive operator tree. The operator tree should be walked and annotated
with statistics information. The attached statistics will vary for each operator depending
on the operation it performs. For example, select operator will change the average row size
but doesn't affect the number of rows. Similarly filter operator will change the number of
rows but doesn't change the average row size. Similar rules can be applied for other operators
as well. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message