hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: DNS mismatch between master and regionserver causes doubly registered regionservers
Date Sat, 23 May 2015 05:12:34 GMT
On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> In our system each server has 2 dns associated with it, one always points
> to a private address and the other to public or private depending on the
> context.
> This issue did not show up in 0.94.x, but is showing up on my new 1.x
> cluster.  Basically it goes like this:
> 1. Regionserver starts up, get's its hostname which returns
> `hostA.external` due to our /etc/hosts
> 2. Regionserver registers itself in zookeeper as `hostA.external`
> 3. Regionserver reports for duty in to HMaster, which re-resolves the DNS
> and returns `hostA.internal`.
> 4. HMaster registers server as `hostA.internal`
> 5. Regionserver receives the RegionServerStartupResponse, which contains
> `hostA.internal` and uses that for its RPCs
> 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> regionserver that hasn't checked in yet, and registers it.
> So I think the problem is that step #2 happens before step #5.  You can
> clearly see this in the HRegionServer.java run() function.
Yes. Looks like a regression.

commit 10d336a51d3a5a2694f1898e52afa01dc9dc1798
Author: rajeshbabu <rajeshbabu@unknown>
Date:   Thu Oct 24 18:26:42 2013 +0000

    HBASE-9593 Region server left in online servers list forever if it went
down after registering to master and before creating ephemeral node

    git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1535479

Regionserver used to use the name given it by the master registering in zk
and when it heartbeated the master. We arrived at this approach after lots
of pain double registering regionservers because of disagreements in naming
between cluster nodes. Above commit changed the order and seems to have
broken this facility.

Will open issue to fix....


> In 0.94, the `createMyEphemeralNode` function was called within
> `handleReportForDutyResponse`.  In 1.x, it happens within `run()` BEFORE
> `handleReportForDutyResponse`.
> I can work around this by handling /etc/hosts specially for my
> regionservers.  We have our /etc/hosts file set up like this for a reason,
> but I think I can special case regionservers.
> However, it seems like a bug that there are mechanisms built in for the
> HMaster to determine the RegionServer hostname, but that these mechanisms
> do not account for doubly-registered regionservers due to zookeeper and
> hmaster mismatch.
> I tried to create a JIRA for this, but either my username no longer has
> permissions for creating, or I can't find the place to create them
> anymore.  Any help?
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message