hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Beaudreault <bbeaudrea...@hubspot.com>
Subject Re: DNS mismatch between master and regionserver causes doubly registered regionservers
Date Fri, 22 May 2015 19:40:32 GMT
HBASE-12954 looks like it would solve my issue, but is not in cdh5.4.0.  I
also don't think it fixes what I think the real bug is -- it's more of a
workaround.

In terms of the actual bug, I think one of at least two possible solutions
should be considered:

1. Remove the support for hostnameFromMasterPOV in
HRegionServer#handleReportForDutyResponse

2. Move HRegionServer#createMyEphemeralNode in HRegionServer#run to *after* the
call to HRegionServer#handleReportForDutyResponse.  This way, any new
hostname returned by the HMaster would be reflected in the ZNodes created
in createMyEphemeralNode.

The latter seems like the better fix, since it doesn't remove any
functionality.  Of course, there might be historical reasons for this
ordering that I am not aware.

bq. To my knowledge, latest release was 1.1.0. The release before that was
1.0.1

I went to http://www.apache.org/dyn/closer.cgi/hbase/, chose a mirror, and
chose the latest. http://mirror.metrocast.net/apache/hbase/

I just verified that the snippets of the run() function I've referred to
are pretty much identical in the latest stable,
http://mirror.metrocast.net/apache/hbase/stable/ (1.0.1.1) as well

On Fri, May 22, 2015 at 3:34 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> bq. hbase-1.1.0.1
>
> To my knowledge, latest release was 1.1.0. The release before that was
> 1.0.1
>
> Can you clarify ?
>
> Thanks
>
> On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > Thank you Esteban.  I checked two different versions:
> >
> > - hbase-1.0.0-cdh5.4.0 (this is the version I use)
> > - hbase-1.1.0.1 (just wanted to check the latest release)
> >
> > On Fri, May 22, 2015 at 3:13 PM, Esteban Gutierrez <esteban@cloudera.com
> >
> > wrote:
> >
> > > Hi Bryan,
> > >
> > > could you please be more specific about the 1.x version that you are
> > using?
> > > we have  HBASE-13481 and HBASE-12954 so it depends on which version of
> > 1.x
> > > you are using.
> > >
> > > Regarding your account issue, I have created an INFRA JIRA on your
> behalf
> > > to look into your account problem.
> > >
> > > thanks,
> > > esteban.
> > >
> > >
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com> wrote:
> > >
> > > > In our system each server has 2 dns associated with it, one always
> > points
> > > > to a private address and the other to public or private depending on
> > the
> > > > context.
> > > >
> > > > This issue did not show up in 0.94.x, but is showing up on my new 1.x
> > > > cluster.  Basically it goes like this:
> > > >
> > > > 1. Regionserver starts up, get's its hostname which returns
> > > > `hostA.external` due to our /etc/hosts
> > > > 2. Regionserver registers itself in zookeeper as `hostA.external`
> > > > 3. Regionserver reports for duty in to HMaster, which re-resolves the
> > DNS
> > > > and returns `hostA.internal`.
> > > > 4. HMaster registers server as `hostA.internal`
> > > > 5. Regionserver receives the RegionServerStartupResponse, which
> > contains
> > > > `hostA.internal` and uses that for its RPCs
> > > > 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
> > > > regionserver that hasn't checked in yet, and registers it.
> > > >
> > > > So I think the problem is that step #2 happens before step #5.  You
> can
> > > > clearly see this in the HRegionServer.java run() function.
> > > >
> > > > In 0.94, the `createMyEphemeralNode` function was called within
> > > > `handleReportForDutyResponse`.  In 1.x, it happens within `run()`
> > BEFORE
> > > > `handleReportForDutyResponse`.
> > > >
> > > >
> > > > I can work around this by handling /etc/hosts specially for my
> > > > regionservers.  We have our /etc/hosts file set up like this for a
> > > reason,
> > > > but I think I can special case regionservers.
> > > >
> > > > However, it seems like a bug that there are mechanisms built in for
> the
> > > > HMaster to determine the RegionServer hostname, but that these
> > mechanisms
> > > > do not account for doubly-registered regionservers due to zookeeper
> and
> > > > hmaster mismatch.
> > > >
> > > > I tried to create a JIRA for this, but either my username no longer
> has
> > > > permissions for creating, or I can't find the place to create them
> > > > anymore.  Any help?
> > > >
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message