Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 50C9310A04 for ; Fri, 18 Oct 2013 20:01:57 +0000 (UTC) Received: (qmail 71934 invoked by uid 500); 18 Oct 2013 20:01:50 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 71852 invoked by uid 500); 18 Oct 2013 20:01:49 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 71741 invoked by uid 500); 18 Oct 2013 20:01:48 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 71716 invoked by uid 99); 18 Oct 2013 20:01:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 20:01:45 +0000 Date: Fri, 18 Oct 2013 20:01:45 +0000 (UTC) From: "Prasanth J (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-5562) Provide stripe level column statistics in ORC MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-5562: ----------------------------- Status: Patch Available (was: Open) marking it as patch available. > Provide stripe level column statistics in ORC > --------------------------------------------- > > Key: HIVE-5562 > URL: https://issues.apache.org/jira/browse/HIVE-5562 > Project: Hive > Issue Type: New Feature > Components: File Formats > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: orcfile > Fix For: 0.13.0 > > Attachments: HIVE-5562.1.patch.txt > > > ORC maintains two levels of column statistics. Index statistics (for every rowgroup) and file level column statistics for the entire file. It is useful to have stripe level column statistics which will be intermediate to index and file statistics. The reason to maintain stripe level statistics is that, the current input split computation logic is based on stripe boundaries. So if stripe level statistics are available and if a stripe doesn't satisfy a predicate condition then that entire stripe (also split) can be eliminated from split computation. -- This message was sent by Atlassian JIRA (v6.1#6144)