hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Created] (HIVE-9873) Hive on MR throws DeprecatedParquetHiveInput exception
Date Thu, 05 Mar 2015 17:35:38 GMT
Sergio Peña created HIVE-9873:
---------------------------------

             Summary: Hive on MR throws DeprecatedParquetHiveInput exception
                 Key: HIVE-9873
                 URL: https://issues.apache.org/jira/browse/HIVE-9873
             Project: Hive
          Issue Type: Bug
            Reporter: Sergio Peña
            Assignee: Sergio Peña


The following error is thrown when information about columns is changed on {{projectionPusher.pushProjectionsAndFilters}}.


{noformat}
2015-02-26 15:56:40,275 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.io.IOException: java.io.IOException: java.io.IOException: DeprecatedParquetHiveInput
: size of object differs. Value size :  23, Current Object size : 29
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: java.io.IOException: DeprecatedParquetHiveInput : size of
object differs. Value size :  23, Current Object size : 29
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:105)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:224)
	... 11 more
Caused by: java.io.IOException: DeprecatedParquetHiveInput : size of object differs. Value
size :  23, Current Object size : 29
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:199)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:52)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
	... 15 more
{noformat}

The bug is in {{ParquetRecordReaderWrapper}}. We store metastore such as the list of columns
in the {{Configuration/JobConf}}. The issue is that this metadata is incorrect until the call
to {{projectionPusher.pushProjectionsAndFilters}}. In the current codebase we don't use the
configuration object returned from {{projectionPusher.pushProjectionsAndFilters}} in other
sections of code such as creation and initialization of {{realReader}}. The end result is
that parquet is given an empty read schema and returns all nulls. Since the join key is null,
no records are joined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message