hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2439) HBase can get stuck if updates to META are blocked
Date Wed, 14 Apr 2010 17:38:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856979#action_12856979

Kannan Muthukkaruppan commented on HBASE-2439:


<<< eventually the RS just devolves into repeatedly writing:
2010-04-13 19:20:17,396 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
.META.,,1 has too many store files, putting it back at the end of the flush queue.

yes, that's exactly what I ran into as well.

Re: <<<< I think we should commit the whole thing for the durability  branch,

and everything but the table descriptor change for the 0.20.4 branch >>>>

I am little confused about 0.20.4 vs. 0.20.5 vs. durability branch. I see 
some issues moved to 0.20.5.  Is 0.20's trunk now essentially 0.20.5? 
And is there a separate durability branch outside of the 0.20.x series?


> HBase can get stuck if updates to META are blocked
> --------------------------------------------------
>                 Key: HBASE-2439
>                 URL: https://issues.apache.org/jira/browse/HBASE-2439
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2439_0.20_dont_block_meta.txt
> (We noticed this on a import-style test in a small test cluster.)
> If compactions are running slow, and we are doing a lot of region splits, then, since
META has a much smaller hard-coded memstore flush size (16KB), it quickly accumulates lots
of store files. Once this exceeds "hbase.hstore.blockingStoreFiles", flushes to META become
no-ops. This causes METAs memstore footprint to grow. Once this exceeds "hbase.hregion.memstore.block.multiplier
* 16KB", we block further updates to META.
> In my test setup:
>   hbase.hregion.memstore.block.multiplier = 4.
> and,
>   hbase.hstore.blockingStoreFiles = 15.
> And we saw messages of the form:
> {code}
> 2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 23 on 60020' on region .META.,,1: memstore size 64.2k is >= than
blocking 64.0k size
> {code}
> Now, if around the same time the CompactSplitThread does a compaction and determines
it is going split the region. As part of finishing the split, it wants to update META about
the daughter regions. 
> It'll end up waiting for the META to become unblocked. The single CompactSplitThread
is now held up, and no further compactions can proceed.  META's compaction request is itself
blocked because the compaction queue will never get cleared.
> This essentially creates a deadlock and the region server is able to not progress any
further. Eventually, each region server's CompactSplitThread ends up in the same state.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message