hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <>
Subject [jira] [Commented] (HIVE-7219) Improve performance of serialization utils in ORC
Date Thu, 19 Jun 2014 19:58:24 GMT


Prasanth J commented on HIVE-7219:

bq. Question: Should the following information from Prasanth J also be documented, and if
so does it belong in the ORC wikidoc or with the parameter description in Configuration Properties?
bq. For integers, this patch will improve only very specific cases. If the encoding uses SHORT_REPEAT,
DELTA (esp. fixed delta), PATCHED_BLOB then this patch will NOT have any effect, as these
encodings does not use bit packing. The bit packed encodings like DIRECT, DELTA (variable
delta) will see improvements.

I think these are too specific for it to be put into user documentation.

> Improve performance of serialization utils in ORC
> -------------------------------------------------
>                 Key: HIVE-7219
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats
>    Affects Versions: 0.14.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: TODOC14
>             Fix For: 0.14.0
>         Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch, HIVE-7219.4.patch,
> ORC uses serialization utils heavily for reading and writing data. The bitpacking and
unpacking code in writeInts() and readInts() can be unrolled for better performance. Also
double reader/writer performance can be improved by bulk reading/writing from/to byte array.

This message was sent by Atlassian JIRA

View raw message