Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 649ADC061 for ; Fri, 27 Apr 2012 05:14:30 +0000 (UTC) Received: (qmail 17000 invoked by uid 500); 27 Apr 2012 05:14:29 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 16726 invoked by uid 500); 27 Apr 2012 05:14:29 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 16315 invoked by uid 99); 27 Apr 2012 05:14:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2012 05:14:26 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2012 05:14:23 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 5A7824222E9 for ; Fri, 27 Apr 2012 05:14:02 +0000 (UTC) Date: Fri, 27 Apr 2012 05:14:02 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: <574294178.1462.1335503642402.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <766010168.43379.1332355182407.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263347#comment-13263347 ] Hadoop QA commented on HBASE-5611: ---------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524810/HBASE-5611-trunk-v2-minorchange.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1666//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1666//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1666//console This message is automatically generated. > Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size > ------------------------------------------------------------------------------------------------------------ > > Key: HBASE-5611 > URL: https://issues.apache.org/jira/browse/HBASE-5611 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.6 > Reporter: Jean-Daniel Cryans > Assignee: Jieshan Bean > Priority: Critical > Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 > > Attachments: HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch > > > This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. > Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: > {noformat} > 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g > 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null > java.lang.IllegalStateException > at com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) > at java.lang.Thread.run(Thread.java:662) > {noformat} > The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. > To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira