hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7219) Improve performance of serialization utils in ORC
Date Mon, 16 Jun 2014 21:38:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasanth J updated HIVE-7219:
-----------------------------

    Attachment: HIVE-7219.4.patch

[~hagleitn] Thanks for taking a look at the failures. The file size changes are because of
the default encoding ("SPEED"). So we will see slight increase in the file size. Regenerated
the appropriate golden files. For the double precision issue, it was because the double reader
was not masking the first byte that is read. Fixed them in this patch.

> Improve performance of serialization utils in ORC
> -------------------------------------------------
>
>                 Key: HIVE-7219
>                 URL: https://issues.apache.org/jira/browse/HIVE-7219
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats
>    Affects Versions: 0.14.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>         Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch, HIVE-7219.4.patch,
orc-read-perf-jmh-benchmark.png
>
>
> ORC uses serialization utils heavily for reading and writing data. The bitpacking and
unpacking code in writeInts() and readInts() can be unrolled for better performance. Also
double reader/writer performance can be improved by bulk reading/writing from/to byte array.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message