Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1149911A80 for ; Fri, 22 Aug 2014 23:38:14 +0000 (UTC) Received: (qmail 30703 invoked by uid 500); 22 Aug 2014 23:38:13 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 30635 invoked by uid 500); 22 Aug 2014 23:38:13 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 30302 invoked by uid 500); 22 Aug 2014 23:38:13 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 30236 invoked by uid 99); 22 Aug 2014 23:38:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2014 23:38:13 +0000 Date: Fri, 22 Aug 2014 23:38:13 +0000 (UTC) From: "pengcheng xiong (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7654?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14107= 709#comment-14107709 ]=20 pengcheng xiong commented on HIVE-7654: --------------------------------------- Dear QA, I think the failed test org.apache.hadoop.hive.cli.TestMiniTezCliDriver= .testCliDriver_dynpart_sort_opt_vectorization is unrelated to the patch. Thanks! > A method to extrapolate columnStats for partitions of a table > ------------------------------------------------------------- > > Key: HIVE-7654 > URL: https://issues.apache.org/jira/browse/HIVE-7654 > Project: Hive > Issue Type: New Feature > Reporter: pengcheng xiong > Assignee: pengcheng xiong > Priority: Minor > Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patc= h, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.pat= ch, HIVE-7654.8.patch, HIVE-7654.9.patch > > > In a PARTITIONED table, there are many partitions. For example,=20 > create table if not exists loc_orc ( > state string, > locid int, > zip bigint > ) partitioned by(year string) stored as orc; > We assume there are 4 partitions, partition(year=3D'2000'), partition(yea= r=3D'2001'), partition(year=3D'2002') and partition(year=3D'2003'). > We can use the following command to compute statistics for columns state,= locid of partition(year=3D'2001') > analyze table loc_orc partition(year=3D'2001') compute statistics for col= umns state,locid; > We need to know the =E2=80=9Caggregated=E2=80=9D column status for the wh= ole table loc_orc. However, we may not have the column status for some part= itions, e.g., partition(year=3D'2002') and also we may not have the column = status for some columns, e.g., zip bigint for partition(year=3D'2001') > We propose a method to extrapolate the missing column status for the part= itions. -- This message was sent by Atlassian JIRA (v6.2#6252)