hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Question about compression
Date Fri, 06 Jul 2012 21:53:32 GMT


On Fri, Jul 6, 2012 at 3:21 AM, Christian Schäfer <syrious3000@yahoo.de> wrote:
> a) Where does compression (like snappy) actually occur.
> I set snappy to a column family and filled it with some data (30 MB) -> 640x480 array
of 11 Bit values.
> After flushing the memstore the size of the data kept exactly the same but flushing was
10x faster than flushing of the table without compression.
> So it's "only" the transfer that is compressed? Or are there possibilities to apply compression
to the HFiles?

The files are compressed on flush/compact and it's done per 64KB
block. I doubt it the file was the same size as the memstore, look at
your log where it gives the numbers for each flush.

> (I'm still using 0.90.4-cdh3u2 because upgrading instructions seems quite tedious to

Stop everything, deploy new version, restart.

> b) Are there some possibilities to apply delta-compression to HBase to minimize disk
usage due to duplicated data?
> Has it to be added or even built or is it already included in HBase?

The first hit when googling "hbase delta compression" returns this:

As you can see it was included in 0.94 (no clue how that translates
for CDH... CDH5??)

There is also prefix compression in the pipeline:

Hope this helps,


View raw message