hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Documenting Guidance on compression and codecs
Date Wed, 11 Sep 2013 21:29:31 GMT
This stuff going to make it into the refguide?  Its good stuff.
St.Ack


On Wed, Sep 11, 2013 at 1:30 PM, lars hofhansl <larsh@apache.org> wrote:

> PE has short and unique keys, so any prefix encoding won't buy much (or
> make it worse).
>
> What's interesting to me is the difference between snappy and lzo, I
> expected them to be mostly equivalent in terms of compression.
>
> So as a general guideline I'd say:
> o If you have long keys (compared to the values) or many columns, use a
> prefix encoder. Only use FAST_DIFF.
> o If the values are large (and not precompressed as in images), use a
> block compressor (SNAPPY, LZO, GZIP, etc)
> o Use GZIP for cold data
> o Use SNAPPY or LZO for hot data.
> o In most cases you do want to enable SNAPPY or LZO by default (low perf
> overhead + space savings).
>
> -- Lars
>
>
>
> ________________________________
>  From: Nick Dimiduk <ndimiduk@gmail.com>
> To: hbase-dev <dev@hbase.apache.org>
> Sent: Wednesday, September 11, 2013 12:10 PM
> Subject: Documenting Guidance on compression and codecs
>
>
> Do we have a consolidated resource with information and recommendations
> about use of the above? For instance, I ran a simple test using
> PerformanceEvaluation, examining just the size of data on disk for 1G of
> input data. The matrix below has some surprising results:
>
> +--------------------+--------------+
> | MODIFIER           | SIZE (bytes) |
> +--------------------+--------------+
> | none               |   1108553612 |
> +--------------------+--------------+
> | compression:SNAPPY |    427335534 |
> +--------------------+--------------+
> | compression:LZO    |    270422088 |
> +--------------------+--------------+
> | compression:GZ     |    152899297 |
> +--------------------+--------------+
> | codec:PREFIX       |   1993910969 |
> +--------------------+--------------+
> | codec:DIFF         |   1960970083 |
> +--------------------+--------------+
> | codec:FAST_DIFF    |   1061374722 |
> +--------------------+--------------+
> | codec:PREFIX_TREE  |   1066586604 |
> +--------------------+--------------+
>
> Where does a wayward soul look for guidance on which combination of the
> above to choose for their application?
>
> Thanks,
> Nick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message