hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steinmaurer Thomas" <Thomas.Steinmau...@scch.at>
Subject RE: GZ better than LZO?
Date Fri, 29 Jul 2011 07:36:23 GMT
Hi Chris!

Your questions are somehow hard to answer for me, because I'm not really
in charge for the test cluster from an administration/setup POV.

Basically, when running:

I see 7 region servers. Each with a "maxHeap" value of 995.

When clicking on the different tables depending on the compression type,
I get the following information:

GZ compressed table: 3 regions hosted by one region server
LZO compressed table: 8 regions hosted by two region servers, where the
start region is hosted by one region server and all other 7 regions are
hosted on the second region server

Regarding the insert pattern etc... please have a look on my reply to
Chiku, where I describe the test data generator and the table layout etc
... a bit.


-----Original Message-----
From: Christopher Tarnas [mailto:cft@tarnas.org] On Behalf Of Chris
Sent: Donnerstag, 28. Juli 2011 19:43
To: user@hbase.apache.org
Subject: Re: GZ better than LZO?

During the load did you add enough data to do a flush or compaction? P,
In our cluster that amount of data inserted would not necessarily be
enough to actually flush store files. Performance really depends on how
the table's regions are laid out, the insert pattern, the number of
regionservers and the amount of RAM allocated to each regionserver. If
you don't see any flushes or compactions in the log try repeating that
test and then flushing the table and do a compaction (or add more data
so it happens automatically) and timing everything. It would be
interesting to see if the GZ benefit holds up.


On Jul 28, 2011, at 6:31 AM, Steinmaurer Thomas wrote:

> Hello,
> we ran a test client generating data into GZ and LZO compressed table.
> Equal data sets (number of rows: 1008000 and the same table schema). ~
> 7.78 GB disk space uncompressed in HDFS. LZO is ~ 887 MB whereas GZ is

> ~
> 444 MB, so basically half of LZO.
> Execution time of the data generating client was 1373 seconds into the

> uncompressed table, 3374 sec. into LZO and 2198 sec. into GZ. The data

> generation client is based on HTablePool and using batch operations.
> So in our (simple) test, GZ beats LZO in both, disk usage and 
> execution time of the client. We haven't tried reads yet.
> Is this an expected result? I thought LZO is the recommended 
> compression algorithm? Or does LZO outperforms GZ with a growing 
> amount of data or in read scenarios?
> Regards,
> Thomas

View raw message