hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4608) HLog Compression
Date Tue, 24 Jan 2012 22:12:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192604#comment-13192604
] 

Nicolas Spiegelberg commented on HBASE-4608:
--------------------------------------------

I think, if we want to avoid scanning the entire log and seek as an optimization, we should
put more effort into rolling logs at a lower size threshold and having log GC be size-based
and get rid of (or greatly raise) the file-count-based pressure.

In production, the major bottleneck for us in log replay (after distributed log splitting)
has been IO dominated.  We normally don't max out CPU.  Anything we can do to minimize IO
size at the expense of CPU would be beneficial to reduction.

As an aside, do we currently compress the output of our log split?  Having the output of the
resulting per-region logs be in LZO or GZ format will decrease our reply time, perhaps more
than this optimization will.  That said, this feature is very useful, just want to make sure
that we're not missing less cool but potentially more beneficial optimizations.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different
datanodes. We can speed up this process by compressing the HLog. Current plan involves using
a dictionary to compress table name, region id, cf name, and possibly other bits of repeated
data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message