avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Compression and splittable Avro files in Hadoop
Date Sat, 01 Oct 2011 02:12:13 GMT
sent from my phone

On Sep 30, 2011 6:43 PM, "Eric Hauser" <ewhauser@gmail.com> wrote:

A coworker and I were having a conversation today about choosing a
compression algorithm for some data we are storing in Hadoop.  We have
been using (https://github.com/tomslabs/avro-utils) for our Map/Reduce
jobs and Haivvreo for integration with Hive.  By default, the
avro-utils OutputFormat uses deflate compression.  Even though
default/zlib/gzip files are not splittable, we decided that Avro data
files are always splittable because individual blocks within the file
are compressed instead of the entire file.

Is this accurate?  Thanks.

View raw message