hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Skippin those gost darn 0 byte diles
Date Tue, 22 Jul 2014 21:45:45 GMT
I looked at the source by curiosity, for the latest version (2.4), the
header is flushed during the writer creation. Of course, key/value classes
are provided. By 0-bytes, you really mean even without the header? Or 0
bytes of payload?


On Tue, Jul 22, 2014 at 11:05 PM, Bertrand Dechoux <dechouxb@gmail.com>
wrote:

> The header is expected to have the full name of the key class and value
> class so if it is only detected with the first record (?) indeed the file
> can not respect its own format.
>
> I haven't tried it but LazyOutputFormat should solve your problem.
>
> https://hadoop.apache.org/docs/current/api/index.html?org/apache/hadoop/mapred/lib/LazyOutputFormat.html
>
> Regards
>
> Bertrand Dechoux
>
>
> Bertrand Dechoux
>
>
> On Tue, Jul 22, 2014 at 10:39 PM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
>> I have two processes. One that writes sequence files directly to hdfs,
>> the other that is a hive table that reads these files.
>>
>> All works well with the exception that I am only flushing the files
>> periodically. SequenceFile input format gets angry when it encounters
>> 0-bytes seq files.
>>
>> I was considering flush and sync on first record write. Also was thinking
>> should just be able to hack sequence file input format to skip 0 byte files
>> and not throw exception on readFully() which it sometimes does.
>>
>> Anyone ever tackled this?
>>
>
>

Mime
View raw message