Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A3A1A123D for ; Tue, 26 Apr 2011 18:17:46 +0000 (UTC) Received: (qmail 60614 invoked by uid 500); 26 Apr 2011 18:17:45 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 60564 invoked by uid 500); 26 Apr 2011 18:17:45 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 60465 invoked by uid 99); 26 Apr 2011 18:17:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2011 18:17:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2011 18:17:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3AB8FB4B2A for ; Tue, 26 Apr 2011 18:17:04 +0000 (UTC) Date: Tue, 26 Apr 2011 18:17:04 +0000 (UTC) From: "Prakash Khemani (JIRA)" To: issues@hbase.apache.org Message-ID: <1653428784.3290.1303841824237.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HBASE-3822) region server stuck in waitOnAllRegionsToClose MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org region server stuck in waitOnAllRegionsToClose ---------------------------------------------- Key: HBASE-3822 URL: https://issues.apache.org/jira/browse/HBASE-3822 Project: HBase Issue Type: Bug Reporter: Prakash Khemani The regionserver is not able to exit because the rs thread is stuck here "regionserver60020" prio=10 tid=0x00002ab2b039e000 nid=0x760a waiting on condition [0x000000004365e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689) at java.lang.Thread.run(Thread.java:619) === In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if there is an exception. (In this case I suspect there was a log-rolling exception because of another issue) // Close the region try { // TODO: If we need to keep updating CLOSING stamp to prevent against // a timeout if this is long-running, need to spin up a thread? if (region.close(abort) == null) { // This region got closed. Most likely due to a split. So instead // of doing the setClosedState() below, let's just ignore and continue. // The split message will clean up the master state. LOG.warn("Can't close region: was already closed during close(): " + regionInfo.getRegionNameAsString()); return; } } catch (IOException e) { LOG.error("Unrecoverable exception while closing region " + regionInfo.getRegionNameAsString() + ", still finishing close", e); } this.rsServices.removeFromOnlineRegions(regionInfo.getEncodedName()); === I think we set the closing flag on the region, it won't be taking any more requests, it is as good as offline. Either we should refine the check in waitOnAllRegionsToClose() or CloseRegionHandler.process() should remove the region from online-regions set. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira