Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 41F54D549 for ; Wed, 15 May 2013 03:33:22 +0000 (UTC) Received: (qmail 95454 invoked by uid 500); 15 May 2013 03:33:22 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 95416 invoked by uid 500); 15 May 2013 03:33:22 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 95376 invoked by uid 99); 15 May 2013 03:33:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 03:33:21 +0000 Date: Wed, 15 May 2013 03:33:21 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8537) Dead region server pulled in from ZK MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657855#comment-13657855 ] Hudson commented on HBASE-8537: ------------------------------- Integrated in hbase-0.95-on-hadoop2 #99 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/99/]) HBASE-8537 Dead region server pulled in from ZK (Revision 1482636) Result = FAILURE jxiang : Files : * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerName.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java > Dead region server pulled in from ZK > ------------------------------------ > > Key: HBASE-8537 > URL: https://issues.apache.org/jira/browse/HBASE-8537 > Project: HBase > Issue Type: Bug > Components: master > Reporter: Jimmy Xiang > Assignee: Jimmy Xiang > Priority: Minor > Fix For: 0.98.0, 0.95.1 > > Attachments: trunk-8537.patch, trunk-8537_v2.patch, trunk-8537_v3.patch > > > When a cluster restarts quickly after it's crashed, although a new region server is reported in, the master still pulls in the dead region server from the zk. > {noformat} > 2013-05-12 18:32:52,996 INFO [IPC Server handler 6 on 36000] org.apache.hadoop.hbase.master.ServerManager: Registering server=a1217.halxg.cloudera.com,36020,1368408767773 > .... > 2013-05-12 18:32:54,653 INFO [master-a1220.halxg.cloudera.com,36000,1368408767520] org.apache.hadoop.hbase.master.HMaster: Registering server found up in zk but who has not yet reported in: a1217.halxg.cloudera.com,36020,1368378273768 > 2013-05-12 18:32:54,653 INFO [master-a1220.halxg.cloudera.com,36000,1368408767520] org.apache.hadoop.hbase.master.ServerManager: Registering server=a1217.halxg.cloudera.com,36020,1368378273768 > {noformat} > We should not pull in the second region server instance from zk. It is actually dead. We can figure this out by the hostname, and the port. We can assume no two region server instances can be alive on the same host, the same port. To be more cautious, we can check the timestamp as well. The live one should be that with the late timestamp, not pulled in from zk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira