hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
Date Tue, 23 Aug 2011 22:09:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089807#comment-13089807

Aaron T. Myers commented on HDFS-1973:

bq. Alternatively client gets the address of both the namenodes. Tries them one at a time
until it gets connected to the new active.

This is what I was trying to communicate with "Configuration-based client failover. Clients
are configured with a set of NN addresses to try until an operation succeeds." I think we're
in agreement on this point, just talking past each other a little bit. :)

bq. Proxy based client failover is an implementation details. It still needs to figure out
the new active based on one of the schemes above.

The proxy process would need to figure out the address of the new active, but clients wouldn't
- the clients would just have the address of the proxy. The only thing for the client to do,
then, would be to retry the RPC to the same address (the address of the proxy.)

bq. +1 for logical URI. We could consider merging this requirement with HDFS-2231 to do this.

Good point. I'll comment there.

bq. Logical URI is needed for identifying a nameservice and not cluster, since federation
supports multiple namenodes with in a cluster.

Good point. In the above design document: s/cluster/nameservice/g.

bq. Why should failover method be based on URI cluster part? Can it be a single mechanism
across all the nameservices? Hence change the parameter to dfs.client.ha.failover.method?

Imagine that one writes a program which uses absolute URIs to connect to two distinct clusters,
one of which is HA-enabled using ZK to resolve the address, and the other is not. In this
case we should use some ZK-based {{FailoverProxyProvider}} for the first, and just the normal
RPC connection for the second. Thus, the configuration should be per-nameservice. I suppose
we could do something like introduce {{dfs.client.ha.failover.method.<nameservice identifier>}},
but that seems more annoying to configure to me.

bq. The scheme you have defined works only for RPC protocols. How about HTTP?

Yes, that's certainly true. My thinking there was that since it's generally less critical
for the NN web interfaces to immediately fail over, and since we don't generally control the
HTTP clients which access the NN web interface, that this be out of scope for this JIRA. To
facilitate this, the operator could either run a standard HTTP proxy, use round-robin DNS,
or even change DNS resolution of the NN and wait for clients to get the update address.

bq. I am not sure why logical URI is required for VIP/failover based setup.

The value of the logical URI could be the same as the actual URI of the proxy. It would then
only be used to configure an appropriate {{FailoverProxyProvider}} which would retry failed
RPCs to the same address.

> HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
> ------------------------------------------------------------------------------------------
>                 Key: HDFS-1973
>                 URL: https://issues.apache.org/jira/browse/HDFS-1973
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Suresh Srinivas
>            Assignee: Aaron T. Myers
> During failover, a client must detect the current active namenode failure and switch
over to the new active namenode. The switch over might make use of IP failover or some thing
more elaborate such as zookeeper to discover the new active.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message