hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <>
Subject [jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
Date Wed, 07 Aug 2013 21:59:47 GMT


Prasanth J commented on HIVE-4123:

Updated the excel sheet. The excel sheet shows the comparison of existing RLE (baseline) vs
the new RLE. The latest patch after code review shows better compression ratio when compared
to old patch as well as the existing RLE. I have also added the encoding and decoding time
to the excel sheet. The encoding and decoding times (in the excel sheet) are not very reliable
since it is calculated for only 1 iteration. I also ran encoding/decoding over a 25M row file
for 5 iterations and took the average of last 3 iterations. HIVE-4123.2.git.patch.txt took
2072ms on average for encoding 25M rows file and 920ms for decoding the encoded file. On the
other hand, HIVE-4123.6.txt took 1374ms on average for encoding 25M rows file and 874ms for
decoding the encoded file. 

> The RLE encoding for ORC can be improved
> ----------------------------------------
>                 Key: HIVE-4123
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.12.0
>         Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt,
HIVE-4123.4.patch.txt, HIVE-4123.5.txt, HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx
> The run length encoding of integers can be improved:
> * tighter bit packing
> * allow delta encoding
> * allow longer runs

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message