hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <>
Subject [jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
Date Sat, 20 Jul 2013 01:03:46 GMT


Prasanth J commented on HIVE-4123:

Thanks Owen for the review comments. There are few things I want to make sure before submitting
the next version of patch.

1) In the current implementation, I kept the delta base field as optional (used only for fixed
delta runs) and zigzag encoded the delta blob so that we don't have to deal with sign of the

I can change delta base field to mandatory field to store the base (absolute min) value of
delta values and zigzag encode it. With base value and delta base value, we should be able
to identify if the sequence is monotonically increasing or decreasing and also we can identify
the sign of the delta values. I hope this is what you are looking for. Please correct me if
my understanding is wrong. 

2) is there any way we can reuse the Orc's MAJOR and MINOR version as supported in HIVE-4724
to figure out if we need use new integer encoding or old integer encoding?

> The RLE encoding for ORC can be improved
> ----------------------------------------
>                 Key: HIVE-4123
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>            Reporter: Owen O'Malley
>            Assignee: Prasanth J
>         Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, ORC-Compression-Ratio-Comparison.xlsx
> The run length encoding of integers can be improved:
> * tighter bit packing
> * allow delta encoding
> * allow longer runs

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message