hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-2439) HBase can get stuck if updates to META are blocked
Date Tue, 13 Apr 2010 21:55:51 GMT
HBase can get stuck if updates to META are blocked

                 Key: HBASE-2439
                 URL: https://issues.apache.org/jira/browse/HBASE-2439
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: Kannan Muthukkaruppan

(We noticed this on a import-style test in a small test cluster.)

If compactions are running slow, and we are doing a lot of region splits, then, since META
has a much smaller hard-coded memstore flush size (16KB), it quickly accumulates lots of store
files. Once this exceeds "hbase.hstore.blockingStoreFiles", flushes to META become no-ops.
This causes METAs memstore footprint to grow. Once this exceeds "hbase.hregion.memstore.block.multiplier
* 16KB", we block further updates to META.

In my test setup:
  hbase.hregion.memstore.block.multiplier = 4.
  hbase.hstore.blockingStoreFiles = 15.

And we saw messages of the form:

2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 23 on 60020' on region .META.,,1: memstore size 64.2k is >= than
blocking 64.0k size

Now, if around the same time the CompactSplitThread does a compaction and determines it is
going split the region. As part of finishing the split, it wants to update META about the
daughter regions. 

It'll end up waiting for the META to become unblocked. The single CompactSplitThread is now
held up, and no further compactions can proceed.  META's compaction request is itself blocked
because the compaction queue will never get cleared.

This essentially creates a deadlock and the region server is able to not progress any further.
Eventually, each region server's CompactSplitThread ends up in the same state.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message