hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Valluri, Sathish" <sathish.vall...@emc.com>
Subject Any sugesstions java.io.IOException: Not a data file error
Date Wed, 30 Oct 2013 08:50:37 GMT
Resending after disabling security signing..



From: Valluri, Sathish [mailto:sathish.valluri@emc.com]
Sent: Wednesday, October 30, 2013 2:17 PM
To: user@hive.apache.org
Subject: Any sugesstions java.io.IOException: Not a data file error



Hi All,



Hive Mapreduce jobs failing with the following java.io.IOException: Not a data file error
if there are files other than avro in the HDFS.

I have created a Hive external table as shown below,



CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.literal'='{ <schema json literal>') STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/testdata/';



Running select count(*) from testable;



When /testdata contains avro files the query works fine and gives the results properly.

If the /testdata have some other format files let's say /testdata/test.txt the query is failing
with the following error.



java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:341)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at
org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:327)
... 11 more Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:72)
at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
... 16 more





Can anyone suggest any parameter or any changes needs to be made for the query to be successful.
Basically Hive should skip the other format files and load only the avro files when processing
data on the HDFS.



Waiting for any suggestions to resolve this issue.



Regards

Sathish Valluri


Mime
View raw message