avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: Compression and splittable Avro files in Hadoop
Date Sat, 01 Oct 2011 18:00:15 GMT
this my approach :
although you could use AvroDatafile, I used my own:

I use SequenceFile , or RCFile, or TFile as an "envelope", and just
serialize avro into a bytes array, and write that into these envelops
as a payload.  I did some tests, TFile envelope was best in speed.

On Fri, Sep 30, 2011 at 6:42 PM, Eric Hauser <ewhauser@gmail.com> wrote:
> A coworker and I were having a conversation today about choosing a
> compression algorithm for some data we are storing in Hadoop.  We have
> been using (https://github.com/tomslabs/avro-utils) for our Map/Reduce
> jobs and Haivvreo for integration with Hive.  By default, the
> avro-utils OutputFormat uses deflate compression.  Even though
> default/zlib/gzip files are not splittable, we decided that Avro data
> files are always splittable because individual blocks within the file
> are compressed instead of the entire file.
> Is this accurate?  Thanks.

View raw message