hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
Date Thu, 04 Aug 2011 20:55:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079584#comment-13079584

Aaron T. Myers commented on HDFS-1973:

h3. Client Failover overview

On failover between active and standby NNs, it's necessary for clients to be redirected to
the new active NN. The goal of HDFS-1623 is to provide a framework for HDFS HA which can in
fact support multiple underlying mechanisms. As such, the client failover approach should
support multiple options.

h3. Cases to support

# Proxy-based client failover. Clients always communicate with an in-band proxy service which
forwards all RPCs on to the correct NN. On failure, a process causes this proxy to begin sending
requests to the now-active NN.
# Virtual IP-based client failover. Clients always connect to a hostname which resolves to
a particular IP address. On failure of the active NN, a process is initiated to switch which
NIC will receive packets intended for said IP address to the now-active NN. (From a client's
perspective, this case is equivalent to case #1.)
# Zookeeper-based client failover. The URI to contact the active NN is stored in Zookeeper
or some other highly-available service. Clients look up which NN to talk to by communicating
with ZK to discern the currently active NN. On failure, some process causes the address stored
in ZK to be changed to point to the now-active NN.
# Configuration-based client failover. Clients are configured with a set of NN addresses to
try until an operation succeeds. This configuration might exist in client-side configuration
files, or perhaps in DNS via a SRV record that lists the NNs with different priorities.

h3. Assumptions

This proposal assumes that NN fencing works, and that after a failover any standby NN is either
unreachable or will throw a {{StandbyException}} on any RPC from a client. That is, a client
will not possibly receive incorrect results if it chooses to contact the wrong NN. This proposal
also presumes that there is no direct coordination required between any central failover coordinator
and clients, i.e. there's an intermediate name resolution system of some sort (ZK, DNS, local
configuration, etc.)

h3. Proposal

The commit of HADOOP-7380 already introduced a facility whereby an IPC {{RetryInvocationHandler}}
can utilize a {{FailoverProxyProvider}} implementation to perform the appropriate client-side
action in the event of failover. At the moment, the only implementation of a {{FailoverProxyProvider}}
is the {{DefaultFailoverProxyProvider}}, which does nothing in the case of failover. HADOOP-7380
also added an {{@Idempotent}} annotation which can be used to identify which methods can be
safely retried during a failover event.

What remains, then, is:

# To implement {{FailoverProxyProviders}} which can support the cases outlined above (and
perhaps others).
# To provide a mechanism to select which {{FailoverProxyProvider}} implementation to use for
a given HDFS URI.
# To annotate the appropriate HDFS {{ClientProtocol}} interface methods with the {{@Idempotent}}

h4. {{FailoverProxyProvider}} implementations

Cases 1 and 2 above can be achieved by implementing a single {{FailoverProxyProvider}} which
simply retries to reconnect to the previous hostname/IP address on failover. Cases 3 and 4
can be implemented as distinct custom {{FailoverProxyProviders}}.

h4. A mechanism to select the appropriate {{FailoverProxyProvider}} implementation

I propose we add a mechanism to configure a mapping from URI authority -> {{FailoverProxyProvider}}
implementation. Absolute URIs which previously specified the NN host name will instead contain
a logical cluster name (which might be chosen to be identical to one of the NN's host names)
which will be used by the chosen {{FailoverProxyProvider}} to determine the appropriate host
to connect to. Introducing the concept of a cluster name will be a useful abstraction in general
if, for example, in the future someone develops a fully-distributed NN, the cluster name still

On instantiation of a {{DFSClient}} (or other user of an HDFS URI, e.g. HFTP), the mapping
would be checked to see if there's an entry for the given URI authority. If there is not,
then a normal RPC client with connected socket to the given authority will be created as is
done today with a {{DefaultProxyProvider}}. If there is an entry, then the authority will
be treated as a logical cluster name, a {{FailoverProxyProvider}} of the correct type will
be instantiated (via a factory class), and an RPC client will be created which utilizes this
{{FailoverProxyProvider}}. The various {{FailoverProxyProvider}} implementations are responsible
for their own configuration.

As a straw man example, consider the following configuration:



This would cause URIs which begin with {{hdfs://cluster1.foo.com}} to use the {{ZookeeperFailoverProxyProvider}}.
Slash-relative URIs would also default to using this. An absolute URI which, for example,
referenced an NN in another cluster (e.g. {{nn.cluster2.foo.com}}) which was not HA-enabled
would default to using the {{DefaultFailoverProxyProvider}}.

h3. Questions

# I believe this scheme will work transparently with viewfs. Instead of configuring the mount
table to communicate with a particular NN for a given portion of the name space, one would
configure viewfs to use the logical cluster name, which when paired with the configuration
from URI authority -> {{FailoverProxyProvider}} will cause the appropriate {{FailoverProxyProvider}}
to be selected and the appropriate NN to be located. I'm no viewfs expert and so would love
to hear any thoughts on this.
# Are there any desirable client failover mechanisms I'm forgetting about?
# I'm sure there are places (which I haven't fully identified yet) where host names are cached
client side. Those may need to get changed as well.
# Federation already introduced a concept of a "cluster ID", but in Federation this is not
intended to be user-facing. Should we combine this notion with the "logical cluster name"
I described above?

> HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
> ------------------------------------------------------------------------------------------
>                 Key: HDFS-1973
>                 URL: https://issues.apache.org/jira/browse/HDFS-1973
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Suresh Srinivas
>            Assignee: Aaron T. Myers
> During failover, a client must detect the current active namenode failure and switch
over to the new active namenode. The switch over might make use of IP failover or some thing
more elaborate such as zookeeper to discover the new active.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message