hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2439) HBase can get stuck if updates to META are blocked
Date Wed, 14 Apr 2010 05:26:53 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856757#action_12856757

Todd Lipcon commented on HBASE-2439:

[From code review standpoint, +1 on the patch. I think we should commit the whole thing
for the durability branch, and everything but the table descriptor change for the 0.20.4 branch]

> HBase can get stuck if updates to META are blocked
> --------------------------------------------------
>                 Key: HBASE-2439
>                 URL: https://issues.apache.org/jira/browse/HBASE-2439
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2439_0.20_dont_block_meta.txt
> (We noticed this on a import-style test in a small test cluster.)
> If compactions are running slow, and we are doing a lot of region splits, then, since
META has a much smaller hard-coded memstore flush size (16KB), it quickly accumulates lots
of store files. Once this exceeds "hbase.hstore.blockingStoreFiles", flushes to META become
no-ops. This causes METAs memstore footprint to grow. Once this exceeds "hbase.hregion.memstore.block.multiplier
* 16KB", we block further updates to META.
> In my test setup:
>   hbase.hregion.memstore.block.multiplier = 4.
> and,
>   hbase.hstore.blockingStoreFiles = 15.
> And we saw messages of the form:
> {code}
> 2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 23 on 60020' on region .META.,,1: memstore size 64.2k is >= than
blocking 64.0k size
> {code}
> Now, if around the same time the CompactSplitThread does a compaction and determines
it is going split the region. As part of finishing the split, it wants to update META about
the daughter regions. 
> It'll end up waiting for the META to become unblocked. The single CompactSplitThread
is now held up, and no further compactions can proceed.  META's compaction request is itself
blocked because the compaction queue will never get cleared.
> This essentially creates a deadlock and the region server is able to not progress any
further. Eventually, each region server's CompactSplitThread ends up in the same state.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message