avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Healy <the...@bnl.gov>
Subject Re: Possible to include open .avro file in Map/Reduce job?
Date Fri, 18 Jan 2013 14:51:07 GMT
Thanks Doug.

In this case I could truncate the logs earlier, but then I have to go
back at some point and recombine the small files. For now, I can live
with moving the files daily.

I was unable to find a way to trap the "Invalid Sync"
(org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid
sync! at

Since my mapper extends AvroMapper, and map throws exceptions, I don't
know where to trap it. Another person suggested using low-level avro
functions for this. Perhaps I need to write an avro file validator of
some sort to be run before the Map/Reduce job? This seems nasty. But I
had another M/R job failure for this error over night, and even finding
the offending file via the logs is quite a pain.

Any suggestions?


On 01/17/2013 04:36 PM, Doug Cutting wrote:
> Folks often move files once they're closed into a directory where
> they're processed to avoid issues with partially written data.  Maybe
> you could start a new log file every hour rather than every day?
> We could add an ignoreTruncation or ignoreCorruption option to
> DataFileReader that attempts to read files that might be truncated or
> corrupted.
> And yes, you can probably just catch those exceptions and exit the map
> at that point.
> Doug
> On Mon, Jan 14, 2013 at 11:22 AM, Terry Healy <thealy@bnl.gov> wrote:
>> I have a log collection application that writes .avro files within HDFS.
>> Ideally I would like to include the current days (open for append) file
>> as one of the input files for a periodic M/R job.
>> I tried this but the Map job exited in error with the dreaded "Invalid
>> Sync!" IOException. I guess I should have expected this, but is there a
>> reasonable way around it? Can I catch the exception and just exit the
>> map at that point?
>> All suggestions appreciated.
>> -Terry

View raw message