Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C530CF7E for ; Wed, 2 May 2012 03:25:25 +0000 (UTC) Received: (qmail 51035 invoked by uid 500); 2 May 2012 03:25:25 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 50819 invoked by uid 500); 2 May 2012 03:25:24 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 50811 invoked by uid 99); 2 May 2012 03:25:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2012 03:25:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2012 03:25:22 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 0802E42AD46 for ; Wed, 2 May 2012 03:25:01 +0000 (UTC) Date: Wed, 2 May 2012 03:25:01 +0000 (UTC) From: "Jieshan Bean (JIRA)" To: issues@hbase.apache.org Message-ID: <1308524975.15837.1335929101051.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <766010168.43379.1332355182407.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266313#comment-13266313 ] Jieshan Bean commented on HBASE-5611: ------------------------------------- Thank you very much, Ted. > Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size > ------------------------------------------------------------------------------------------------------------ > > Key: HBASE-5611 > URL: https://issues.apache.org/jira/browse/HBASE-5611 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.6 > Reporter: Jean-Daniel Cryans > Assignee: Jieshan Bean > Priority: Critical > Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 > > Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch > > > This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. > Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: > {noformat} > 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g > 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null > java.lang.IllegalStateException > at com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) > at java.lang.Thread.run(Thread.java:662) > {noformat} > The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. > To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira