hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client using Zookeeper
Date Sat, 28 Feb 2015 07:26:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341399#comment-14341399

Bikas Saha commented on HDFS-7858:

bq. The client will proceed to connect to that NN first (thereby removing non-determinism
from the current scheme).. and will most probably succeed. It will contact ZK only if the
connection was unsuccessful..
Yes. It will most probably succeed. But when will it not succeed? When that NN has failed
over or has crashed, right? Which means that every time a known primary NN becomes unavailable
there will be surge of failed connections to it (from cached entries that point to it) and
then these connections will be redirected to ZK. For a proxy of the number of connections
consider MR jobs, where every Map task running on every machine has a DFS client to read 
from HDFS and every Reduce task on every machine has a DFS client to write to HDFS. MR tasks
are typically short lived clients.

> Improve HA Namenode Failover detection on the client using Zookeeper
> --------------------------------------------------------------------
>                 Key: HDFS-7858
>                 URL: https://issues.apache.org/jira/browse/HDFS-7858
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
> In an HA deployment, Clients are configured with the hostnames of both the Active and
Standby Namenodes.Clients will first try one of the NNs (non-deterministically) and if its
a standby NN, then it will respond to the client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is undergoing some
GC / is busy, then those clients might not get a response soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Since Zookeeper is already used as the failover controller, the clients could talk
to ZK and find out which is the active namenode before contacting it.
> 2) Long-lived DFSClients would have a ZK watch configured which fires when there is a
failover so they do not have to query ZK everytime to find out the active NN
> 2) Clients can also cache the last active NN in the user's home directory (~/.lastNN)
so that short-lived clients can try that Namenode first before querying ZK

This message was sent by Atlassian JIRA

View raw message