hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gang Tim Liu (JIRA)" <>
Subject [jira] [Created] (HIVE-3958) support partial scan for analyze command
Date Tue, 29 Jan 2013 19:27:13 GMT
Gang Tim Liu created HIVE-3958:

             Summary: support partial scan for analyze command
                 Key: HIVE-3958
             Project: Hive
          Issue Type: Improvement
            Reporter: Gang Tim Liu
            Assignee: Gang Tim Liu

analyze commands allows us to collect statistics on existing tables/partitions. It works great
but might be slow since it scans all files.

There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and fast enough
for use case. Hive-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files but part of it
to get file metadata. some examples are for RCFile,
ORC ( HIVE-3874 ) and HFile of Hbase

This jira is targeted to address the #2

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message