hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-4608) HLog Compression
Date Wed, 14 Mar 2012 00:08:44 GMT

     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-4608:

    Attachment: 4608v23.txt

Renamed method enableCompression in all places to be setCompressionContext

Made all instances of compression contexts have same name rather than a new name every time

Cleaned up unused 'compression' data member flag or moved them local from being data members
when only used by a single method.

Removed define of TRUE and repeat of ENABLE_WAL_COMPRESSION key from
SequenceFileLogReader.  No longer needed.

Rather than have the sequencefile metadata code making sprinkled over the reader and writer,
instead do all in writer and have reader use write methods.

Added a global WAL type as metadata.

Added a compression type to metadata.

Renamed method WALCompressionEnabled as isWALCompressionEnabled.

Added some small tests to TestLRUDictionary and a new TestCompressor that taught me how this
stuff works.  Added documentation to methods where I was surprised; e.g. addEntry will happily
add new entry even though already has dictionary entry.

Miscellaneous cleanup.

I ran this compression on one of our production logs and it halved its size.  See below. 
I then decompressed and then recompressed and I got the same size back.

-rwxrwxrwx   1 stack  staff  28540761 Mar 13 16:47 sv4r25s8%3A60020.1331661889339.out.out.out
-rwxrwxrwx   1 stack  staff  64945799 Mar 13 16:45 sv4r25s8%3A60020.1331661889339.out.out
-rwxrwxrwx   1 stack  staff  28540761 Mar 13 16:44 sv4r25s8%3A60020.1331661889339.out
-rw-r--r--   1 stack  staff  64928728 Mar 13 16:25 sv4r25s8%3A60020.1331661889339

Will run more of our production logs through the compressor this evening to see if I can turn
up bugs.
> HLog Compression
> ----------------
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt,
4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt,
4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
> The current bottleneck to HBase write speed is replicating the WAL appends across different
datanodes. We can speed up this process by compressing the HLog. Current plan involves using
a dictionary to compress table name, region id, cf name, and possibly other bits of repeated
data. Also, HLog format may be changed in other ways to produce a smaller HLog.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message