hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
Date Sat, 28 Apr 2012 21:56:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264414#comment-13264414

Hudson commented on HBASE-5611:

Integrated in HBase-0.94 #158 (See [https://builds.apache.org/job/HBase-0.94/158/])
    HBASE-5611 Revert due to consistent TestChangingEncoding failure in 0.94 branch (Revision

     Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java

> Replayed edits from regions that failed to open during recovery aren't removed from the
global MemStore size
> ------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-5611
>                 URL: https://issues.apache.org/jira/browse/HBASE-5611
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.6
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jieshan Bean
>            Priority: Critical
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>         Attachments: 5611-94.addendum, HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch,
> This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still
possible to hit it if a region fails to open for more obscure reasons like HDFS errors.
> Consider a region that just went through distributed splitting and that's now being opened
by a new RS. The first thing it does is to read the recovery files and put the edits in the
{{MemStores}}. If this process takes a long time, the master will move that region away. At
that point the edits are still accounted for in the global {{MemStore}} size but they are
dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}}
needs to force flush a region and that none of them have edits:
> {noformat}
> 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush
thread woke up because memory above low water=5.9g
> 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache
flusher failed for entry null
> java.lang.IllegalStateException
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}
> The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during
recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and
it still printing this out.
> To fix this we need to be able to decrease the global {{MemStore}} size when the region
can't open.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message