hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5891) Change Compression Based on Type of Compaction
Date Sat, 05 May 2012 18:25:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269029#comment-13269029

Andrew Purtell commented on HBASE-5891:

It used to be possible (circa 0.90) to vary the compression algorithm used for flushes and
minor compactions and that for major compactions. I added this because we had a case under
consideration where data would grow colder proportionally to the delta between current and
write time. It was simple and low impact to set flush compaction to LZO and major compaction
to BZIP2 (and we flirted with LZMA but that is simply too bandwidth constrained), and a script
would trigger region-by-region major compaction daily. I don't know if this is maintained
in the current code base. Compaction was significantly reworked 0.90 -> 0.92 and we didn't
pick up the majority of these changes in our internal version. 
> Change Compression Based on Type of Compaction
> ----------------------------------------------
>                 Key: HBASE-5891
>                 URL: https://issues.apache.org/jira/browse/HBASE-5891
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Nicolas Spiegelberg
>            Priority: Minor
> We currently use LZO on our production systems because the on-demand decompression speed
of GZ is too slow.  That said, many of our major-compacted StoreFiles are infrequently read
because of lazy seek optimizations, but they occupy the majority of our disk space.  One idea
is to change the type of compression depending upon compaction characteristics (input size
or major compaction flag).  This would allow us to have our largest and least-read files be
GZ compressed and save space.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message