hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clay B. (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12954) Ability impaired using HBase on multihomed hosts
Date Tue, 03 Feb 2015 02:19:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302667#comment-14302667
] 

Clay B. commented on HBASE-12954:
---------------------------------

[~apurtell] and [~stack], I think there's some past design I'm not catching as to a desire
to split internal and external HBase identification.  Is the desire for a "split-view" (internal/external)
of a cluster, for a network having a completely isolated cluster (e.g. public access is not
accessible from internal networks but somehow external requests can still permeate in and
be answered); I can't quite envision such a network. Certainly, I've seen interesting issues
running multiple region servers on the same machine but the port part of the {{RegionServerStatus}}(?)
must be the key disambiguator there; in the CDH3 days (~0.90.4) one could certainly end up
with duplicate registration due to inconsistent DNS/hostfile entries and that was bad; this
would provide a canonical hostname from the region server should one chose to take matters
into their own hands.

I would be more concerned from an operations perspective that a mapping file (internal and
external /etc/hosts for HBase) or script needs to be identical across a cluster (e.g. would
need to be updated atomically) versus a configuration in a region server's hbase-site.xml
which would be a single source of truth for that region server only and if incorrect would
only affect that region server (ideally if we can figure out a good way to prevent potential
duplicate registration otherwise duplicate registration could be a problem). It seems that
a script would end up needing to query some atomic single source of truth like Zookeeper,
Consul or etcd in the end anyways (as a master may jump e.g. due to a hardware failure at
any time and one may want to move a region server(s)'s hostname) versus distributing responsibilty
to the region servers and having a good check for duplicate registration. (Perhaps this could
implement some UUID generation scheme as was suggested in the similar HBASE-3413, if protecting
users from themselves is a key concern; also since work like HBASE-5844 has come about since
the duplicate registration days, is this as big of a problem -- would we delete the znode
of a competing region server?)

Again, following the line of a mapping script/file, would the thought be that the master enforce
some reregistration process so that if hosta 1.1.1.1 becomes hostb which was 1.1.1.2 we don't
allow all sorts of havoc to come about because region server identiies were changed in the
mapping file but no region server restart was performed? (Further, how would this be handled
gracefully and the administrator handle coordination across a cluster?)

> Ability impaired using HBase on multihomed hosts
> ------------------------------------------------
>
>                 Key: HBASE-12954
>                 URL: https://issues.apache.org/jira/browse/HBASE-12954
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.4
>            Reporter: Clay B.
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 12954-v1.txt, Hadoop Three Interfaces.png
>
>
> For HBase clusters running on unusual networks (such as NAT'd cloud environments or physical
machines with multiple IP's per network interface) it would be ideal to have a way to both
specify:
> # which IP interface to which HBase master or region-server will bind
> # what hostname HBase will advertise in Zookeeper both for a master or region-server
process
> While efforts such as HBASE-8640 go a long way to normalize these two sources of information,
it is not possible in the current design of the properties available to an administrator for
these to be unambiguously specified.
> One has been able to request {{hbase.master.ipc.address}} or {{hbase.regionserver.ipc.address}}
but one can not specify the desired HBase {{hbase.master.hostname}}. (It was removed in HBASE-1357,
further I am unaware of a region-server equivalent.)
> I use a configuration management system to generate all of my configuration files on
a per-machine basis. As such, an option to generate a file specifying exactly which hostname
to use would be helpful.
> Today, specifying the bind address for HBase works and one can use an HBase-only DNS
for faking what to put in Zookeeper but this is far from ideal. Network interfaces have no
intrinsic IP address, nor hostname. Specifing a DNS server is awkward as the DNS server may
differ from the system's resolver and is a single IP address. Similarly, on hosts which use
a transient VIP (e.g. through keepalived) for other services, it means there's a seemingly
non-deterministic hostname choice made by HBase depending on the state of the VIP at daemon
start-up time.
> I will attach two networking examples I use which become very difficult to manage under
the current properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message