hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@email.com>
Subject Re: GZ better than LZO?
Date Thu, 28 Jul 2011 17:42:34 GMT
During the load did you add enough data to do a flush or compaction? P, In our cluster that
amount of data inserted would not necessarily be enough to actually flush store files. Performance
really depends on how the table's regions are laid out, the insert pattern, the number of
regionservers and the amount of RAM allocated to each regionserver. If you don't see any flushes
or compactions in the log try repeating that test and then flushing the table and do a compaction
(or add more data so it happens automatically) and timing everything. It would be interesting
to see if the GZ benefit holds up.

-chris

On Jul 28, 2011, at 6:31 AM, Steinmaurer Thomas wrote:

> Hello,
> 
> 
> 
> we ran a test client generating data into GZ and LZO compressed table.
> Equal data sets (number of rows: 1008000 and the same table schema). ~
> 7.78 GB disk space uncompressed in HDFS. LZO is ~ 887 MB whereas GZ is ~
> 444 MB, so basically half of LZO.
> 
> 
> 
> Execution time of the data generating client was 1373 seconds into the
> uncompressed table, 3374 sec. into LZO and 2198 sec. into GZ. The data
> generation client is based on HTablePool and using batch operations.
> 
> 
> 
> So in our (simple) test, GZ beats LZO in both, disk usage and execution
> time of the client. We haven't tried reads yet.
> 
> 
> 
> Is this an expected result? I thought LZO is the recommended compression
> algorithm? Or does LZO outperforms GZ with a growing amount of data or
> in read scenarios?
> 
> 
> 
> Regards,
> 
> Thomas
> 
> 
> 


Mime
View raw message