Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE07C17E13 for ; Wed, 6 May 2015 21:15:00 +0000 (UTC) Received: (qmail 78892 invoked by uid 500); 6 May 2015 21:15:00 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 78824 invoked by uid 500); 6 May 2015 21:15:00 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 78581 invoked by uid 99); 6 May 2015 21:15:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 21:15:00 +0000 Date: Wed, 6 May 2015 21:15:00 +0000 (UTC) From: "Sushanth Sowmyan (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531432#comment-14531432 ] Sushanth Sowmyan commented on HIVE-9451: ---------------------------------------- After discussion with Owen, marking as tentative for 1.2 - i.e. this will not hold up the RC process for 1.2.0, but if it makes it before we release, it'll be part of 1.2.0. This will still be honoured for inclusion in a 1.2.1 when we do it. > Add max size of column dictionaries to ORC metadata > --------------------------------------------------- > > Key: HIVE-9451 > URL: https://issues.apache.org/jira/browse/HIVE-9451 > Project: Hive > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Labels: ORC > Fix For: 1.2.0 > > Attachments: HIVE-9451.patch, HIVE-9451.patch > > > To predict the amount of memory required to read an ORC file we need to know the size of the dictionaries for the columns that we are reading. I propose adding the number of bytes for each column's dictionary to the stripe's column statistics. The file's column statistics would have the maximum dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)