avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Avro speed comparison with raw logs
Date Thu, 31 Mar 2011 01:51:10 GMT
gzip/deflate is approximately the same speed to decompress for all
compression levels.
However, for compression, it varies by a factor of 5 or so between the
fastest (1) and slowest (9).

This is a useful link for gzip performance characteristics:
http://tukaani.org/lzma/benchmarks.html

On 3/4/11 9:25 AM, "Doug Cutting" <cutting@apache.org> wrote:

>On 03/01/2011 09:05 PM, felix gao wrote:
>> I am running some comparison tests on a data set that I converted to
>> avro with deflator set to level 6. The original logs consists of 2880
>> uncompressed http access logs with a total size of 1.4TB. The Compressed
>> avro log is about 2/3 of the size.  However, when I ran the same pig job
>> on the raw logs, it is blazing fast during the initial map phase.
>> Finished in under 40 min. When I ran the same pig job with avro files,
>> the initial map phase took 8 minutes to only finish 10%.  I am wondering
>> is there any way to figure out what is slowing down the map?
>
>What version of Avro are you using?  How are you integrating Avro with
>Pig?
>
>Also, for speed, you might try level=1 (Deflater.BEST_SPEED).
>
>Doug


Mime
View raw message