hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-4478) In ORC, add boolean noNulls flag to column stripe metadata
Date Thu, 02 May 2013 21:26:15 GMT
Eric Hanson created HIVE-4478:
---------------------------------

             Summary: In ORC, add boolean noNulls flag to column stripe metadata
                 Key: HIVE-4478
                 URL: https://issues.apache.org/jira/browse/HIVE-4478
             Project: Hive
          Issue Type: Sub-task
            Reporter: Eric Hanson
            Assignee: Owen O'Malley


Currently, the stripe metadata for ORC contains the min and max value for each column in the
stripe. This will be used for stripe elimination. However, an additional bit of metadata,
noNulls (true/false), is needed to help speed up vectorized query execution as much as 30%.


The vectorized QE code has a Boolean flag for each column vector called noNulls. If this is
true, all the null-checking logic is skipped. For simple filters and arithmetic expressions,
this can save on the order of 30% of the time.

Once this noNulls stripe metadata is available, the vectorized iterator for ORC can be updated
to avoid all expense to load the isNull bitmap, and efficiently set the noNulls flag for each
column vector.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message