hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HBASE-2439) HBase can get stuck if updates to META are blocked
Date Wed, 14 Apr 2010 22:42:48 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack resolved HBASE-2439.

     Hadoop Flags: [Reviewed]
    Fix Version/s: 0.20.4
       Resolution: Fixed

Applied to 0.20 branch as is.  Applied to 0.20_pre_durability and TRUNK w/o the change to
table descriptor as per Todd suggestion (In former to minimize change and in latter because
patch failed since limit had already been removed).  Thanks for the patch Kannan.

> HBase can get stuck if updates to META are blocked
> --------------------------------------------------
>                 Key: HBASE-2439
>                 URL: https://issues.apache.org/jira/browse/HBASE-2439
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.20.4, 0.20.5, 0.21.0
>         Attachments: 2439_0.20_dont_block_meta.txt
> (We noticed this on a import-style test in a small test cluster.)
> If compactions are running slow, and we are doing a lot of region splits, then, since
META has a much smaller hard-coded memstore flush size (16KB), it quickly accumulates lots
of store files. Once this exceeds "hbase.hstore.blockingStoreFiles", flushes to META become
no-ops. This causes METAs memstore footprint to grow. Once this exceeds "hbase.hregion.memstore.block.multiplier
* 16KB", we block further updates to META.
> In my test setup:
>   hbase.hregion.memstore.block.multiplier = 4.
> and,
>   hbase.hstore.blockingStoreFiles = 15.
> And we saw messages of the form:
> {code}
> 2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 23 on 60020' on region .META.,,1: memstore size 64.2k is >= than
blocking 64.0k size
> {code}
> Now, if around the same time the CompactSplitThread does a compaction and determines
it is going split the region. As part of finishing the split, it wants to update META about
the daughter regions. 
> It'll end up waiting for the META to become unblocked. The single CompactSplitThread
is now held up, and no further compactions can proceed.  META's compaction request is itself
blocked because the compaction queue will never get cleared.
> This essentially creates a deadlock and the region server is able to not progress any
further. Eventually, each region server's CompactSplitThread ends up in the same state.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message