hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7219) Improve performance of serialization utils in ORC
Date Wed, 11 Jun 2014 21:45:02 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasanth J updated HIVE-7219:
-----------------------------

    Attachment: orc-read-perf-jmh-benchmark.png

Ran some benchmarks to see reader improvements. Used JMH to run benchmarks with 10 warmup
iterations and 10 benchmark iterations. Only the dataset that made use of bit packing were
chosen for this benchmark.
Number of rows for datasets are
inventory_col2 and inventory_col4: 11745000
twitter_census_api_id: 24556361
twitter_search_id: 9396618
github_payload_size: 3216293
aol_querylog_epoch: 3558411
random.nexLong(): 10000000

> Improve performance of serialization utils in ORC
> -------------------------------------------------
>
>                 Key: HIVE-7219
>                 URL: https://issues.apache.org/jira/browse/HIVE-7219
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats
>    Affects Versions: 0.14.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>         Attachments: HIVE-7219.1.patch, orc-read-perf-jmh-benchmark.png
>
>
> ORC uses serialization utils heavily for reading and writing data. The bitpacking and
unpacking code in writeInts() and readInts() can be unrolled for better performance. Also
double reader/writer performance can be improved by bulk reading/writing from/to byte array.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message