hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client
Date Sat, 14 Mar 2015 02:33:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361536#comment-14361536
] 

Karthik Kambatla commented on HDFS-7858:
----------------------------------------

If possible, it would be nice to make the solution here accessible to YARN as well. 

Simultaneously connecting to all the masters (NNs in HDFS and RMs in YARN) might work most
of the time. How do we plan to handle a split-brain? In YARN, we don't use an explicit fencing
mechanism. IIRR, one is not required to configure a fencing mechanism when using QJM? 


> Improve HA Namenode Failover detection on the client
> ----------------------------------------------------
>
>                 Key: HDFS-7858
>                 URL: https://issues.apache.org/jira/browse/HDFS-7858
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: HDFS-7858.1.patch, HDFS-7858.2.patch, HDFS-7858.2.patch, HDFS-7858.3.patch
>
>
> In an HA deployment, Clients are configured with the hostnames of both the Active and
Standby Namenodes.Clients will first try one of the NNs (non-deterministically) and if its
a standby NN, then it will respond to the client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is undergoing some
GC / is busy, then those clients might not get a response soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Since Zookeeper is already used as the failover controller, the clients could talk
to ZK and find out which is the active namenode before contacting it.
> 2) Long-lived DFSClients would have a ZK watch configured which fires when there is a
failover so they do not have to query ZK everytime to find out the active NN
> 2) Clients can also cache the last active NN in the user's home directory (~/.lastNN)
so that short-lived clients can try that Namenode first before querying ZK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message