hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gang Tim Liu (JIRA)" <>
Subject [jira] [Commented] (HIVE-3917) Support fast operation for analyze command
Date Tue, 29 Jan 2013 19:13:14 GMT


Gang Tim Liu commented on HIVE-3917:

[~shreepadma] sorry to not fully understand your initial question.
[~ashutoshc] thank you very much for explaining it in more details and carrying the discussion.
great! thanks.

Yes, partial scan is a great choice. Actually, we have thought about it. With,
we can even achieve it for RCFile.

Yes, it will be faster than full scan but still slower than noscan. Consider a big data warehouse,
partial scan is still magnitude slower than noscan. With potential speedup from simple MR,
it will be sill slower than noscan.

Saying that, we can view all 3 as great choices for different use cases: noscan, partial scan
and full scan (which is default).

I will create a follow up for partial scan.
> Support fast operation for analyze command
> ------------------------------------------
>                 Key: HIVE-3917
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>    Affects Versions: 0.11.0
>            Reporter: Gang Tim Liu
>            Assignee: Gang Tim Liu
>         Attachments: HIVE-3917.patch.1
> hive supports analyze command to gather statistics from existing tables/partition
> It collects:
> 1. Number of Rows
> 2. Number of files
> 3. Size in Bytes
> If table/partition is big, the operation would take time since it will open all files
and scan all data.
> It would be nice to support fast operation to gather statistics which doesn't require
to open all files:
> 1. Number of files
> 2. Size in Bytes
> Potential syntax is 
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS
> In the future, all statistics without scan can be retrieved via this optional parameter.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message