hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1361) table/partition level statistics
Date Mon, 09 Aug 2010 17:33:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896634#action_12896634

Ning Zhang commented on HIVE-1361:

Ahmed has put up the design doc on the wiki: http://wiki.apache.org/hadoop/Hive/StatsDev.

Ahmed is also finalizing the patch for review. 

There are some minor changes from the original requirement: currently the stats gather are
# of rows, total size in bytes, # files and # of partitions (for table). It does not have
the min/max/avg of row/file sizes since they are different in the raw size (serialized and
compressed) with the sizes we saw during stats gathering (deserialized and decompressed).
And there are no strong use cases for them currently, so we'll exclude them for this patch.

> table/partition level statistics
> --------------------------------
>                 Key: HIVE-1361
>                 URL: https://issues.apache.org/jira/browse/HIVE-1361
>             Project: Hadoop Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ahmed M Aly
> At the first step, we gather table-level stats for non-partitioned table and partition-level
stats for partitioned table. Future work could extend the table level stats to partitioned
table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a particular table/partition.

>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for existing
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message