hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <>
Subject [jira] [Commented] (HIVE-4478) In ORC, add boolean noNulls flag to column stripe metadata
Date Fri, 31 May 2013 15:44:46 GMT


Owen O'Malley commented on HIVE-4478:

I've pushed this to Prasanth. I think the best approach is to suppress the isPresent bit stream
in the case that the entire column is present for the stripe. The ORC reader already handles
this correctly by assuming that all values are present.
> In ORC, add boolean noNulls flag to column stripe metadata
> ----------------------------------------------------------
>                 Key: HIVE-4478
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: File Formats
>            Reporter: Eric Hanson
>            Assignee: Prasanth J
> Currently, the stripe metadata for ORC contains the min and max value for each column
in the stripe. This will be used for stripe elimination. However, an additional bit of metadata
for each column for each stripe, noNulls (true/false), is needed to help speed up vectorized
query execution as much as 30%. 
> The vectorized QE code has a Boolean flag for each column vector called noNulls. If this
is true, all the null-checking logic is skipped for that column for a VectorizedRowBatch when
an operation is performed on that column. For simple filters and arithmetic expressions, this
can save on the order of 30% of the time.
> Once this noNulls stripe metadata is available, the vectorized iterator (reader) for
ORC can be updated to avoid all expense to load the isNull bitmap, and efficiently set the
noNulls flag for each column vector.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message