hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-6320) Row-based ORC reader with PPD turned on dies on BufferUnderFlowException/IndexOutOfBoundsException
Date Fri, 01 Aug 2014 23:59:39 GMT

     [ https://issues.apache.org/jira/browse/HIVE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasanth J updated HIVE-6320:
-----------------------------

    Description: 
ORC data reader crashes out on a BufferUnderflowException, while trying to read data row-by-row
with the predicate push-down enabled on current trunk.

Stack trace:
{code}
Caused by: java.nio.BufferUnderflowException
	at java.nio.Buffer.nextGetIndex(Buffer.java:472)
	at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
	at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)
{code}
 OR it could be
{code}
Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)
        at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:180)
        at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:197)
        at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:252)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:59)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:300)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:475)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1159)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2198)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:108)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
        ... 15 more
{code}

The query run is 

{code}
set hive.vectorized.execution.enabled=false;
set hive.optimize.index.filter=true;

insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey is not null;
{code}

*Reason:*
The issue is related to generating the disk range boundaries. If two adjacent row groups have
same compressed block offset then the worst case slop that was added to the end offset will
contain only the current compression block. In some cases the values towards the end of this
compression block will stretch beyond the boundary to fetch values causing BufferUnderFlowException
or IndexOutOfBoundsException.

  was:
ORC data reader crashes out on a BufferUnderflowException, while trying to read data row-by-row
with the predicate push-down enabled on current trunk.

Stack trace:
{code}
Caused by: java.nio.BufferUnderflowException
	at java.nio.Buffer.nextGetIndex(Buffer.java:472)
	at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
	at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)
{code}
 OR it could be
{code}
Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)
        at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:180)
        at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:197)
        at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:252)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:59)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:300)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:475)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1159)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2198)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:108)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
        ... 15 more
{code}

The query run is 

{code}
set hive.vectorized.execution.enabled=false;
set hive.optimize.index.filter=true;

insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey is not null;
{code}

Reason:



> Row-based ORC reader with PPD turned on dies on BufferUnderFlowException/IndexOutOfBoundsException

> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6320
>                 URL: https://issues.apache.org/jira/browse/HIVE-6320
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.13.0
>            Reporter: Gopal V
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6320.1.patch, HIVE-6320.2.patch, HIVE-6320.2.patch, HIVE-6320.3.patch
>
>
> ORC data reader crashes out on a BufferUnderflowException, while trying to read data
row-by-row with the predicate push-down enabled on current trunk.
> Stack trace:
> {code}
> Caused by: java.nio.BufferUnderflowException
> 	at java.nio.Buffer.nextGetIndex(Buffer.java:472)
> 	at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
> 	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
> 	at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)
> {code}
>  OR it could be
> {code}
> Caused by: java.lang.IndexOutOfBoundsException
>         at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)
>         at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:180)
>         at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:197)
>         at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:252)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:59)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:300)
>         at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:475)
>         at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1159)
>         at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2198)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:108)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)
>         at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
>         ... 15 more
> {code}
> The query run is 
> {code}
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.index.filter=true;
> insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey is not
null;
> {code}
> *Reason:*
> The issue is related to generating the disk range boundaries. If two adjacent row groups
have same compressed block offset then the worst case slop that was added to the end offset
will contain only the current compression block. In some cases the values towards the end
of this compression block will stretch beyond the boundary to fetch values causing BufferUnderFlowException
or IndexOutOfBoundsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message