hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Documenting Guidance on compression and codecs
Date Wed, 11 Sep 2013 20:30:13 GMT
PE has short and unique keys, so any prefix encoding won't buy much (or make it worse).

What's interesting to me is the difference between snappy and lzo, I expected them to be mostly
equivalent in terms of compression.

So as a general guideline I'd say:
o If you have long keys (compared to the values) or many columns, use a prefix encoder. Only
use FAST_DIFF.
o If the values are large (and not precompressed as in images), use a block compressor (SNAPPY,
LZO, GZIP, etc)
o Use GZIP for cold data
o Use SNAPPY or LZO for hot data.
o In most cases you do want to enable SNAPPY or LZO by default (low perf overhead + space
savings).

-- Lars



________________________________
 From: Nick Dimiduk <ndimiduk@gmail.com>
To: hbase-dev <dev@hbase.apache.org> 
Sent: Wednesday, September 11, 2013 12:10 PM
Subject: Documenting Guidance on compression and codecs
 

Do we have a consolidated resource with information and recommendations
about use of the above? For instance, I ran a simple test using
PerformanceEvaluation, examining just the size of data on disk for 1G of
input data. The matrix below has some surprising results:

+--------------------+--------------+
| MODIFIER           | SIZE (bytes) |
+--------------------+--------------+
| none               |   1108553612 |
+--------------------+--------------+
| compression:SNAPPY |    427335534 |
+--------------------+--------------+
| compression:LZO    |    270422088 |
+--------------------+--------------+
| compression:GZ     |    152899297 |
+--------------------+--------------+
| codec:PREFIX       |   1993910969 |
+--------------------+--------------+
| codec:DIFF         |   1960970083 |
+--------------------+--------------+
| codec:FAST_DIFF    |   1061374722 |
+--------------------+--------------+
| codec:PREFIX_TREE  |   1066586604 |
+--------------------+--------------+

Where does a wayward soul look for guidance on which combination of the
above to choose for their application?

Thanks,
Nick
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message