Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B28887050 for ; Sat, 10 Sep 2011 08:47:42 +0000 (UTC) Received: (qmail 50968 invoked by uid 500); 10 Sep 2011 08:47:42 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 50280 invoked by uid 500); 10 Sep 2011 08:47:37 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 50272 invoked by uid 99); 10 Sep 2011 08:47:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Sep 2011 08:47:35 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Sep 2011 08:47:31 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B316A8D840 for ; Sat, 10 Sep 2011 08:47:09 +0000 (UTC) Date: Sat, 10 Sep 2011 08:47:09 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: <606556476.13006.1315644429730.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <796832720.23242.1315362430068.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4340) Hbase can't balance if ServerShutdownHandler encountered exception MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102017#comment-13102017 ] Hudson commented on HBASE-4340: ------------------------------- Integrated in HBase-TRUNK #2196 (See [https://builds.apache.org/job/HBase-TRUNK/2196/]) HBASE-4340 Hbase can't balance if ServerShutdownHandler encountered exception (Jinchao Gao) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java > Hbase can't balance if ServerShutdownHandler encountered exception > ------------------------------------------------------------------ > > Key: HBASE-4340 > URL: https://issues.apache.org/jira/browse/HBASE-4340 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.4 > Reporter: gaojinchao > Assignee: gaojinchao > Fix For: 0.90.5 > > Attachments: HBASE-4340_branch90.patch > > > Version: 0.90.4 > Cluster : 40 boxes > As I saw below logs. It said that balance couldn't work because of a dead RS. > I dug deeply and found two issues: > 1. shutdownhandler didn't clear numProcessing deal with some exceptions. It seems whatever exceptions we should clear the flag or close master. > 2. "dead regionserver(s): [158-1-130-12,20020,1314971097929]" is inaccurate. The dead sever should be " 158-1-130-10,20020,1315068597979" > //master logs: > 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > 2011-09-05 02:18:00,543 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] > // the exception logs :. > 2011-09-03 18:13:27,550 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-133-11,20020,1315069437236, region=0db4088d75c58dd22f93f389d90ba6cc > 2011-09-03 18:13:27,550 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException > at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:480) > at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:454) > at org.apache.hadoop.hbase.catalog.MetaReader.metaRowToRegionPairWithInfo(MetaReader.java:400) > at org.apache.hadoop.hbase.catalog.MetaReader.getServerUserRegions(MetaReader.java:591) > at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:176) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2011-09-03 18:13:27,550 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for 158-1-134-15,20020,1315065238916 > 2011-09-03 18:13:27,566 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for ufdr,001146,1314955304624.22f6d43e78c903196f206881fc149488. so generated a random one; hri=ufdr,001146,1314955304624.22f6d43e78c903196f206881fc149488., src=, dest=158-1-132-17,20020,1315069441916; 31 (online=31, exclude=null) available servers > 201 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira