Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC710F0C8 for ; Thu, 2 May 2013 21:26:15 +0000 (UTC) Received: (qmail 63617 invoked by uid 500); 2 May 2013 21:26:15 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 63408 invoked by uid 500); 2 May 2013 21:26:15 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 63398 invoked by uid 500); 2 May 2013 21:26:15 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 63394 invoked by uid 99); 2 May 2013 21:26:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 May 2013 21:26:15 +0000 Date: Thu, 2 May 2013 21:26:15 +0000 (UTC) From: "Eric Hanson (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-4478) In ORC, add boolean noNulls flag to column stripe metadata MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Eric Hanson created HIVE-4478: --------------------------------- Summary: In ORC, add boolean noNulls flag to column stripe metadata Key: HIVE-4478 URL: https://issues.apache.org/jira/browse/HIVE-4478 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Owen O'Malley Currently, the stripe metadata for ORC contains the min and max value for each column in the stripe. This will be used for stripe elimination. However, an additional bit of metadata, noNulls (true/false), is needed to help speed up vectorized query execution as much as 30%. The vectorized QE code has a Boolean flag for each column vector called noNulls. If this is true, all the null-checking logic is skipped. For simple filters and arithmetic expressions, this can save on the order of 30% of the time. Once this noNulls stripe metadata is available, the vectorized iterator for ORC can be updated to avoid all expense to load the isNull bitmap, and efficiently set the noNulls flag for each column vector. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira