Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3FDA611560 for ; Thu, 31 Jul 2014 18:45:39 +0000 (UTC) Received: (qmail 15717 invoked by uid 500); 31 Jul 2014 18:45:38 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 15640 invoked by uid 500); 31 Jul 2014 18:45:38 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 15626 invoked by uid 500); 31 Jul 2014 18:45:38 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 15623 invoked by uid 99); 31 Jul 2014 18:45:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jul 2014 18:45:38 +0000 Date: Thu, 31 Jul 2014 18:45:38 +0000 (UTC) From: "Prasanth J (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-6382) PATCHED_BLOB encoding in ORC will corrupt data in some cases MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6382: ----------------------------- Description: In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of long that stores gap (g) between the values that are patched and the patch value (p). The maximum distance of gap can be 511 that require 8 bits to encode. And patch values can take more than 56 bits. When patch values take more than 56 bits, p + g will become > 64 bits which cannot be packed to a long. This will result in data corruption under the case where patch values are > 56 bits. Stack trace will look like: {code} Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:746) at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744) at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1320) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1849) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:75) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:249) ... 7 more {code} was:In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of long that stores gap (g) between the values that are patched and the patch value (p). The maximum distance of gap can be 511 that require 8 bits to encode. And patch values can take more than 56 bits. When patch values take more than 56 bits, p + g will become > 64 bits which cannot be packed to a long. This will result in data corruption under the case where patch values are > 56 bits. > PATCHED_BLOB encoding in ORC will corrupt data in some cases > ------------------------------------------------------------ > > Key: HIVE-6382 > URL: https://issues.apache.org/jira/browse/HIVE-6382 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: orcfile > Fix For: 0.13.0 > > Attachments: HIVE-6382.1.patch, HIVE-6382.2.patch, HIVE-6382.3.patch, HIVE-6382.4.patch, HIVE-6382.5.patch, HIVE-6382.6.patch > > > In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of long that stores gap (g) between the values that are patched and the patch value (p). The maximum distance of gap can be 511 that require 8 bits to encode. And patch values can take more than 56 bits. When patch values take more than 56 bits, p + g will become > 64 bits which cannot be packed to a long. This will result in data corruption under the case where patch values are > 56 bits. > Stack trace will look like: > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 > at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593) > at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541) > at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:746) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1320) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1849) > at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:75) > at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) > at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) > at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) > at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:249) > ... 7 more > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)