hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client
Date Fri, 24 Jul 2015 08:15:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640133#comment-14640133

Vinayakumar B commented on HDFS-7858:

I have a small question here.
I believe all client operations will successfully talk to only Active NameNode.
In current {{ConfiguredFailoverProxyProvider}}, only at the beginning, when the client initializes,
there will be a need of trying to both Nodes, if standby comes first.
During failover, if ANN goes down and SNN is still not failedover, then client has to try
again to previous ANN and come back to current SNN to check for the failover one more time.
Once the successful proxy found, all subsequent requests will go there.

In case of proposed {{RequestHedgingProxyProvider}}, Only at the beginning, there will not
be any failed proxy, at that time hedged requests will goto both NNs.
During failover, current failed proxy (prev ANN) will be ignored for hedged requests, i.e.
in case of failover of HA, only one request will be invoked (SNN) in hedged invocations. Am
I right?

This way I feel both {{ConfiguredFailoverProxyProvider}} and {{RequestHedgingProxyProvider}}
work same way, except at the very first time. And yes, if no. of proxies to try to are more
than 2 then {{RequestHedgingProxyProvider}}  will be best.

Am I missing something here?

> Improve HA Namenode Failover detection on the client
> ----------------------------------------------------
>                 Key: HDFS-7858
>                 URL: https://issues.apache.org/jira/browse/HDFS-7858
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7858.1.patch, HDFS-7858.2.patch, HDFS-7858.2.patch, HDFS-7858.3.patch,
HDFS-7858.4.patch, HDFS-7858.5.patch, HDFS-7858.6.patch, HDFS-7858.7.patch, HDFS-7858.8.patch,
> In an HA deployment, Clients are configured with the hostnames of both the Active and
Standby Namenodes.Clients will first try one of the NNs (non-deterministically) and if its
a standby NN, then it will respond to the client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is undergoing some
GC / is busy, then those clients might not get a response soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Since Zookeeper is already used as the failover controller, the clients could talk
to ZK and find out which is the active namenode before contacting it.
> 2) Long-lived DFSClients would have a ZK watch configured which fires when there is a
failover so they do not have to query ZK everytime to find out the active NN
> 2) Clients can also cache the last active NN in the user's home directory (~/.lastNN)
so that short-lived clients can try that Namenode first before querying ZK

This message was sent by Atlassian JIRA

View raw message