hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5922) In orc.InStream.CompressedStream, the desired position passed to seek can equal offsets[i] + bytes[i].remaining() when ORC predicate pushdown is enabled
Date Fri, 01 Jul 2016 19:09:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359489#comment-15359489
] 

Prasanth Jayachandran commented on HIVE-5922:
---------------------------------------------

[~puneet884]/[~frankluo] Are you using the ORC writer APIs directly or using custom MapReduce
program to write ORC files? If so can you make sure the row that you are writing to ORC is
not null. Recently observed similar issue with a customer and looks like they were using a
custom MR program that writes to ORC file. The value passed to writer.addRow() seems to have
been null. Orc writer does not expect null as a row. The columns within a row can be null
though. This can be verified from orcfiledump. If you guys can provide me orcfiledump output
I can confirm if that's the case. 

{code}
writer.addRow(null); // invalid
writer.addRow([a, b, null, d]); // valid
writer.addRow([null, null, null, null]); // valid
{code}
We are planning to fix this in the writer API documentation and code to make sure users does
not pass null row.

> In orc.InStream.CompressedStream, the desired position passed to seek can equal offsets[i]
+ bytes[i].remaining() when ORC predicate pushdown is enabled
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-5922
>                 URL: https://issues.apache.org/jira/browse/HIVE-5922
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>            Reporter: Yin Huai
>
> Two stack traces ...
> {code}
> java.io.IOException: IO error in map input file hdfs://10.38.55.204:8020/user/hive/warehouse/ssdb_bin_compress_orc_large_0_13.db/cycle/000004_0
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.io.IOException: Seek outside of data in compressed
stream Stream for column 9 kind DATA position: 21496054 length: 33790900 range: 2 offset:
1048588 limit: 1048588 range 0 = 13893791 to 1048588;  range 1 = 17039555 to 1310735;  range
2 = 20447466 to 1048588;  range 3 = 23855377 to 1048588;  range 4 = 27263288 to 1048588; 
range 5 = 30409052 to 1310735 uncompressed: 262144 to 262144 to 21496054
> 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> 	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> 	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230)
> 	... 9 more
> Caused by: java.io.IOException: Seek outside of data in compressed stream Stream for
column 9 kind DATA position: 21496054 length: 33790900 range: 2 offset: 1048588 limit: 1048588
range 0 = 13893791 to 1048588;  range 1 = 17039555 to 1310735;  range 2 = 20447466 to 1048588;
 range 3 = 23855377 to 1048588;  range 4 = 27263288 to 1048588;  range 5 = 30409052 to 1310735
uncompressed: 262144 to 262144 to 21496054
> 	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.seek(InStream.java:328)
> 	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:161)
> 	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:205)
> 	at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:110)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:86)
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> 	... 13 more
> {\code}
> {code}
> java.io.IOException: IO error in map input file hdfs://10.38.55.204:8020/user/hive/warehouse/ssdb_bin_compress_orc_large_0_13.db/cycle/000095_0
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.lang.IllegalStateException: Can't read header at
compressed stream Stream for column 9 kind DATA position: 20447466 length: 20958101 range:
6 offset: 1835029 limit: 1835029 range 0 = 0 to 524294;  range 1 = 1835029 to 2097176;  range
2 = 5242940 to 1835029;  range 3 = 8650851 to 1835029;  range 4 = 11796615 to 2097176;  range
5 = 15204526 to 2097176;  range 6 = 18612437 to 1835029 uncompressed: 262144 to 262144
> 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> 	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> 	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230)
> 	... 9 more
> Caused by: java.lang.IllegalStateException: Can't read header at compressed stream Stream
for column 9 kind DATA position: 20447466 length: 20958101 range: 6 offset: 1835029 limit:
1835029 range 0 = 0 to 524294;  range 1 = 1835029 to 2097176;  range 2 = 5242940 to 1835029;
 range 3 = 8650851 to 1835029;  range 4 = 11796615 to 2097176;  range 5 = 15204526 to 2097176;
 range 6 = 18612437 to 1835029 uncompressed: 262144 to 262144
> 	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:195)
> 	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:205)
> 	at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
> 	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
> 	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:110)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:86)
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> 	... 13 more
> {\code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message