Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E248B9536 for ; Thu, 26 Apr 2012 14:48:37 +0000 (UTC) Received: (qmail 62295 invoked by uid 500); 26 Apr 2012 14:48:37 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 62263 invoked by uid 500); 26 Apr 2012 14:48:37 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 62251 invoked by uid 99); 26 Apr 2012 14:48:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 14:48:37 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 14:48:36 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8FCBF4101C0 for ; Thu, 26 Apr 2012 14:48:16 +0000 (UTC) Date: Thu, 26 Apr 2012 14:48:16 +0000 (UTC) From: "Zhihong Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <1302685224.5421.1335451696590.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <766010168.43379.1332355182407.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262648#comment-13262648 ] Zhihong Yu commented on HBASE-5611: ----------------------------------- @Jieshan: When you have multiple patches for different branches, attach patch for trunk apart from the other patches. Otherwise Hadoop QA may pick up the wrong patch. > Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size > ------------------------------------------------------------------------------------------------------------ > > Key: HBASE-5611 > URL: https://issues.apache.org/jira/browse/HBASE-5611 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.6 > Reporter: Jean-Daniel Cryans > Assignee: Jieshan Bean > Priority: Critical > Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 > > Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch > > > This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. > Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: > {noformat} > 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g > 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null > java.lang.IllegalStateException > at com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) > at java.lang.Thread.run(Thread.java:662) > {noformat} > The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. > To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira