hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO
Date Mon, 14 May 2018 18:46:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474614#comment-16474614
] 

Sergey Shelukhin commented on HIVE-19479:
-----------------------------------------

Thanks!

> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> ------------------------------------------------------------
>
>                 Key: HIVE-19479
>                 URL: https://issues.apache.org/jira/browse/HIVE-19479
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>             Fix For: 3.0.0, 3.1.0
>
>         Attachments: HIVE-19479.01.patch, HIVE-19479.patch
>
>
> The PositionProvider offset is not updated correctly and an error like this may happen:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside of the
data
> 	at org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
> 	at org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
> 	at org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
> 	at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
> 	at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
> 	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
> 	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
> 	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
> {noformat}
> We found this happens when ORC writes a strange stream combination - data stream for
a RG has no values (the rows all have nulls), but there are values (0-s) in length stream
for the same rows. That is technically a valid ORC file, although writing the 0s is completely
useless. 
> This may be fixed separately in ORC, but since these files now exist in the wild we should
handle them correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message