hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Demeter Sztanko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
Date Fri, 26 Jun 2015 09:25:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602613#comment-14602613
] 

Demeter Sztanko commented on HIVE-11031:
----------------------------------------

Hello [~prasanth_j], my MR jobs are getting this error when concatenating ORC files:

{code}
java.io.IOException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:105)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:224)
	... 11 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0
	at java.util.Collections$EmptyList.get(Collections.java:3212)
	at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.nextStripe(OrcFileStripeMergeRecordReader.java:82)
	at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.next(OrcFileStripeMergeRecordReader.java:71)
	at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.next(OrcFileStripeMergeRecordReader.java:31)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
	... 15 more
2015-06-26 08:24:19,248 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
{code}

Is this failure a result of the bug described in this ticket or that can be a different problem?

> ORC concatenation of old files can fail while merging column statistics
> -----------------------------------------------------------------------
>
>                 Key: HIVE-11031
>                 URL: https://issues.apache.org/jira/browse/HIVE-11031
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Critical
>             Fix For: 1.2.1
>
>         Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, HIVE-11031.3.patch,
HIVE-11031.4.patch, HIVE-11031.patch
>
>
> Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics
for newly added types like decimal, date, timestamp etc. But column statistics merging assumes
column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics
directly casts the received ColumnStatistics object without doing instanceof check. If the
ORC file contains time stamp column statistics then this will work else it will throw ClassCastException.
> Also, the file merge operator swallows the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message