Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E8C4310683 for ; Thu, 2 Jan 2014 21:47:50 +0000 (UTC) Received: (qmail 19152 invoked by uid 500); 2 Jan 2014 21:47:50 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 19088 invoked by uid 500); 2 Jan 2014 21:47:50 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 19073 invoked by uid 99); 2 Jan 2014 21:47:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 21:47:50 +0000 Date: Thu, 2 Jan 2014 21:47:50 +0000 (UTC) From: "Jean-Daniel Cryans (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-10271) [regression] Cannot use the wildcard address since HBASE-9593 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Jean-Daniel Cryans created HBASE-10271: ------------------------------------------ Summary: [regression] Cannot use the wildcard address since HBASE-9593 Key: HBASE-10271 URL: https://issues.apache.org/jira/browse/HBASE-10271 Project: HBase Issue Type: Bug Affects Versions: 0.96.1, 0.94.13 Reporter: Jean-Daniel Cryans Priority: Critical HBASE-9593 moved the creation of the ephemeral znode earlier in the region server startup process such that we don't have access to the ServerName from the Master's POV. HRS.getMyEphemeralNodePath() calls HRS.getServerName() which at that point will return this.isa.getHostName(). If you set hbase.regionserver.ipc.address to 0.0.0.0, you will create a znode with that address. What happens next is that the RS will report for duty correctly but the master will do this: {noformat} 2014-01-02 11:45:49,498 INFO [master:172.21.3.117:60000] master.ServerManager: Registering server=0:0:0:0:0:0:0:0%0,60020,1388691892014 2014-01-02 11:45:49,498 INFO [master:172.21.3.117:60000] master.HMaster: Registered server found up in zk but who has not yet reported in: 0:0:0:0:0:0:0:0%0,60020,1388691892014 {noformat} The cluster is then unusable. I think a better solution is to track the heartbeats for the region servers and expire those that haven't checked-in for some time. The 0.89-fb branch has this concept, and they also use it to detect rack failures: https://github.com/apache/hbase/blob/0.89-fb/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L1224. In this jira's scope I would just add the heartbeat tracking and add a unit test for the wildcard address. What do you think [~rajesh23]? -- This message was sent by Atlassian JIRA (v6.1.5#6160)