hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Pi (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4608) HLog Compression
Date Tue, 21 Feb 2012 19:49:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212850#comment-13212850

Li Pi commented on HBASE-4608:

@Kannan - heres the quick overview on 4608:

When writing the HLog, it checks a set of dictionaries for the key, cf, qualifier, tablename,
and regionname. If these items happen to be in the dictionary, it writes the index, instead
of the item. If the item is not in the dictionary, it is added to the dictionary.

When reading from the HLog, it works in the opposite manner. When it encounters an uncompressed
item, it adds it to the dictionary. If it encounters an index, it just fetches what it needs
from the dictionary. 

The dictionary itself is a simple LRU dictionary, that by default, uses 2 bytes per index.
(shorts). There is a seperate dictionary for every different field (e.g. one for tablenames,
one for regionnames...). 

The dictionary merely must be consistent, if given a bunch of things in a certain order, it
should always assign them the same indices, and always evict in the exact same fashion.

This seems to work fairly well - and noticeably cuts down our write sizes on the vast majority
of workloads.
> HLog Compression
> ----------------
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt,
4608v7.txt, 4608v8fixed.txt
> The current bottleneck to HBase write speed is replicating the WAL appends across different
datanodes. We can speed up this process by compressing the HLog. Current plan involves using
a dictionary to compress table name, region id, cf name, and possibly other bits of repeated
data. Also, HLog format may be changed in other ways to produce a smaller HLog.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message