Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 80541 invoked from network); 5 Feb 2011 21:19:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Feb 2011 21:19:55 -0000 Received: (qmail 66526 invoked by uid 500); 5 Feb 2011 21:19:55 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 66031 invoked by uid 500); 5 Feb 2011 21:19:54 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 66018 invoked by uid 99); 5 Feb 2011 21:19:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Feb 2011 21:19:54 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Feb 2011 21:19:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id CD5E9191239 for ; Sat, 5 Feb 2011 21:19:30 +0000 (UTC) Date: Sat, 5 Feb 2011 21:19:30 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: <1577800588.2107.1296940770838.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1518695.225001294450785193.JavaMail.jira@thor> Subject: [jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3431: ------------------------- Attachment: 3431-v2.txt The issue is that if the master sees a RegionServer differently to how the RS sees itself -- e.g. master gets an ip when it does lookup though RS passed a name or if RS passed a FQDN but master has hostname only -- then the master will ask the RS to take on the name the Master sees by passing it back an HServerAddress. This does not work if the two servers are getting different answers from their respective DNS's. The Master knows RS's by their 'ServerName' which is hostname+port+startcode. If DNS is wonky, then the Master and RS will come up with different 'ServerName's even if the Master passes back its HSA (HSA could be IP only, RS does lookup and comes up w/ different hostname if DNS is broke). This patch removes the code that has master trying the RS the identity to use. Instead Master just uses the ServerName the RS volunteered. So far in testing it seems to work when DNS is set up properly and when Master side DNS is broke where its finding IP only for RS. Let me do some more testing. > Regionserver is not using the name given it by the master; double entry in master listing of servers > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.0 > Reporter: stack > Assignee: stack > Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > .... > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira