hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Seigel <ja...@tynt.com>
Subject Re: Comparison between Gzip and LZO
Date Thu, 03 Mar 2011 03:15:36 GMT
slightly not on point for this conversation, but I thought it worth mentioning....LZO is splitable,
which makes it a good for for hadoopy things.  Just something to remember when you do get
some final results on performance.

Cheers
James.


On 2011-03-02, at 8:12 PM, Brian Bockelman wrote:

> 
> I think some profiling is in order: claiming LZO decompresses at 1.0MB/s and is more
than 3x faster at compression than decompression (especially when it's a well known asymmetric
algorithm in favor of decompression speed) is somewhat unbelievable.
> 
> I see that you use small files.  Maybe whatever you do for LZO and Gzip/Hadoop has a
large startup overhead?
> 
> Again, sounds like you'll be spending an hour or so with a profiler.
> 
> Brian
> 
> On Mar 2, 2011, at 2:16 PM, Niels Basjes wrote:
> 
>> Question: Are you 100% sure that nothing else was running on that
>> system during the tests?
>> No cron jobs, no "makewhatis" or "updatedb"?
>> 
>> P.S. There is a permission issue with downloading one of the files.
>> 
>> 2011/3/2 José Vinícius Pimenta Coletto <jvcoletto@gmail.com>:
>>> Hi,
>>> 
>>> I'm making a comparison between the following compression methods: gzip
>>> and lzo provided by Hadoop and gzip from package java.util.zip.
>>> The test consists of compression and decompression of approximately 92,000
>>> files with an average size of 2kb, however the decompression time of lzo is
>>> twice the decompression time of gzip provided by Hadoop, it does not seem
>>> right.
>>> The results obtained in the test are:
>>> 
>>>     Method         |   Bytes   |               Compression
>>>      |                    Decompression
>>>        -           |     -     | Total Time(with i/o)  Time     Speed
>>>       | Total Time(with i/o)  Time      Speed
>>> Gzip (Haddop)        | 200876304 | 121.454s              43.167s
>>> 4,653,424.079 B/s | 332.305s              111.806s   1,796,635.326 B/s
>>> Lzo                  | 200876304 | 120.564s              54.072s
>>> 3,714,914.621 B/s | 509.371s              184.906s   1,086,368.904 B/s
>>> Gzip (java.util.zip) | 200876304 | 148.014s              63.414s
>>> 3,167,647.371 B/s | 483.148s              4.528s    44,360,682.244 B/s
>>> 
>>> You can see the code I'm using to the test here:
>>> http://www.linux.ime.usp.br/~jvcoletto/compression/
>>> 
>>> Can anyone explain me why am I getting these results?
>>> Thanks.
>>> 
>> 
>> 
>> 
>> -- 
>> Met vriendelijke groeten,
>> 
>> Niels Basjes
> 


Mime
View raw message