hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron.Dossett <Aaron.Doss...@target.com>
Subject Hive queries fail on an external avro table with empty files
Date Fri, 25 Sep 2015 21:27:16 GMT
Situation: I have an external avro table in Hive.  Under certain circumstances zero length
files can end up in the top level directory housing the external data.  This causes all hive
queries on the table to fail.  This is with Hive 0.14, but looking at current code base I
think the same problem would occur with the current code.  ( A stack trace is below.)

This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader creates a new
org.apache.avro.file.DataFileReader and DataFileReader throws an exception when trying to
read an empty file (because the empty file lacks the magic number marking it as avro).  It
seems like it be straight forward to modify AvroGenericRecordReader to detect an empty file
and then behave sensibly.  For example, next() would always return false; getPos() would return
zero, etc.

If that approach sounds sensible I will open a JIRA and take a stab at a patch.  Thank you
in advance for any feedback!

-Aaron

Caused by: java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:81)
at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
... 25 more

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message