hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pengcheng xiong (JIRA)" <>
Subject [jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
Date Tue, 12 Aug 2014 17:13:11 GMT


pengcheng xiong commented on HIVE-7654:



Pengcheng Xiong

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

> A method to extrapolate columnStats for partitions of a table
> -------------------------------------------------------------
>                 Key: HIVE-7654
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: pengcheng xiong
>            Assignee: pengcheng xiong
>            Priority: Minor
>         Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch
> In a PARTITIONED table, there are many partitions. For example, 
> create table if not exists loc_orc (
>   state string,
>   locid int,
>   zip bigint
> ) partitioned by(year string) stored as orc;
> We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002')
and partition(year='2003').
> We can use the following command to compute statistics for columns state,locid of partition(year='2001')
> analyze table loc_orc partition(year='2001') compute statistics for columns state,locid;
> We need to know the “aggregated” column status for the whole table loc_orc. However,
we may not have the column status for some partitions, e.g., partition(year='2002') and also
we may not have the column status for some columns, e.g., zip bigint for partition(year='2001')
> We propose a method to extrapolate the missing column status for the partitions.

This message was sent by Atlassian JIRA

View raw message