hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1361) table/partition level statistics
Date Tue, 21 Sep 2010 22:46:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ning Zhang updated HIVE-1361:
-----------------------------

    Attachment: HIVE-1361.2.patch
                HIVE-1361.2_java_only.patch

Uploading a new patch (including a full version and a Java_only version including XML build
files) for review. This is against the latest trunk.

The major changes from the last patch include: 
  1) Make JDBC update/insert/select using PreparedStatement(). 
  2) In HBase, use HTable.delete(ArrayList<Delete>) to speed up delete, and flushCommit()
to batch update. 
  3) Refactor StatsTask to put stats into PartitionStatistics and TableStatistics so that
it is easier to add new stats later. 
  4) Move WriteEntity creation from StatsTask to compile-time.

 I'm running tests again after refreshed to the latest trunk.

> table/partition level statistics
> --------------------------------
>
>                 Key: HIVE-1361
>                 URL: https://issues.apache.org/jira/browse/HIVE-1361
>             Project: Hadoop Hive
>          Issue Type: Sub-task
>          Components: Query Processor
>            Reporter: Ning Zhang
>            Assignee: Ahmed M Aly
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, HIVE-1361.java_only.patch,
HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and partition-level
stats for partitioned table. Future work could extend the table level stats to partitioned
table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a particular table/partition.

>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for existing
tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message