avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mkleppm...@linkedin.com>
Subject Re: AvroKeyValueInputFormat/AvroKeyValueOutputFormat vs AvroSequenceFileInputFormat/AvroSequenceFileOutputFormat
Date Fri, 23 May 2014 08:24:07 GMT
In general, you're probably better off with AvroKeyValueInputFormat/AvroKeyValueOutputFormat,
since that generates Avro data files which you can read from other applications and other
languages. Hadoop sequence files aren't really supported by anything other than Hadoop.

If your data remains entirely within Hadoop, there are cases where you might want to use sequence
files. For example, it might be used for the transient files generated during the shuffle
(output of mappers being fed into reducers).

Martin

On 20 May 2014, at 16:34, Jim Donofrio <donofrio111@gmail.com> wrote:
> What are the pro's and con's of AvroKeyValueInputFormat/AvroKeyValueOutputFormat vs AvroSequenceFileInputFormat/AvroSequenceFileOutputFormat?
Which is more commonly used?
> 
> They both use AvroKey, AvroValue. The only difference seems to be one serializes into
avro data files and other hadoop sequence files.
> 
> Thanks


Mime
View raw message