Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8DAE1116D6 for ; Thu, 19 Jun 2014 19:58:25 +0000 (UTC) Received: (qmail 25298 invoked by uid 500); 19 Jun 2014 19:58:24 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 25222 invoked by uid 500); 19 Jun 2014 19:58:24 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 25200 invoked by uid 500); 19 Jun 2014 19:58:24 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 25197 invoked by uid 99); 19 Jun 2014 19:58:24 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jun 2014 19:58:24 +0000 Date: Thu, 19 Jun 2014 19:58:24 +0000 (UTC) From: "Prasanth J (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7219) Improve performance of serialization utils in ORC MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037771#comment-14037771 ] Prasanth J commented on HIVE-7219: ---------------------------------- bq. Question: Should the following information from Prasanth J also be documented, and if so does it belong in the ORC wikidoc or with the parameter description in Configuration Properties? bq. For integers, this patch will improve only very specific cases. If the encoding uses SHORT_REPEAT, DELTA (esp. fixed delta), PATCHED_BLOB then this patch will NOT have any effect, as these encodings does not use bit packing. The bit packed encodings like DIRECT, DELTA (variable delta) will see improvements. I think these are too specific for it to be put into user documentation. > Improve performance of serialization utils in ORC > ------------------------------------------------- > > Key: HIVE-7219 > URL: https://issues.apache.org/jira/browse/HIVE-7219 > Project: Hive > Issue Type: Improvement > Components: File Formats > Affects Versions: 0.14.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch, HIVE-7219.4.patch, orc-read-perf-jmh-benchmark.png > > > ORC uses serialization utils heavily for reading and writing data. The bitpacking and unpacking code in writeInts() and readInts() can be unrolled for better performance. Also double reader/writer performance can be improved by bulk reading/writing from/to byte array. -- This message was sent by Atlassian JIRA (v6.2#6252)