hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController
Date Wed, 04 Apr 2012 21:01:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246729#comment-13246729

Todd Lipcon commented on HDFS-2185:

Hi Mingjie. Thanks for taking a look.

The idea for the chain of RPCs is from talking with some folks here who work on Hadoop deployment.
Their opinion was the following: currently, most of the Hadoop client tools are too "thick".
For example, in the current manual failover implementation, the fencing is run on the admin
client. This means that you have to run the haadmin command from a machine that has access
to all of the necessary fencing scripts, key files, etc. That's a little bizarre -- you would
expect to configure these kinds of things only on the central location, not on the client.

So, we decided that it makes sense to push the management of the whole failover process into
the FCs themselves, and just use a single RPC to kick off the whole failover process. This
keeps the client "thin".

As for your proposed alternative, here are a few thoughts:

bq. existing manual fo code can be kept mostly
We actually share much of the code already. But, the problem with using the existing code
exactly as is, is that the failover controllers always expect to have complete "control" over
the system. If the state of the NNs changes underneath the ZKFC, then the state in ZK will
become inconsistent with the actual state of the system, and it's very easy to get into split
brain scenarios. So, the idea is that, when auto-failover is enabled, *all* decisions must
be made by ZKFCs. That way we can make sure the ZK state doesn't get out of sync.

bq. although new RPC is added to ZKFC but we don't need them to talk to each other. the manual
failover logic is all handled at client – haadmin.
As noted above I think this is a con, not a pro, because it requires configuring fencing scripts
at the client, and likely requiring that the client have read-write access to ZK

bq. easier to extend to the case of multiple standby NNs

I think the extension path to multiple standby is actually equally easy with both approaches.
The solution in the ZKFC-managed implementation is to add a new znode like "PreferredActive"
and have nodes avoid becoming active unless they're listed as preferred. The target node of
the failover can just set itself to be preferred before asking the other node to cede the

Some other advantages that I probably didn't explain well in the design doc:
- this design is fault tolerant. If the "target" node crashes in the middle of the process,
then the old active will automatically regain the active state after its "rejoin" timeout
elapses. With a client-managed setup, a well-meaning admin may ^C the process in the middle
and leave the system with no active at all.
- no need to introduce "disable/enable" to auto-failover. Just having both nodes quit the
election wouldn't work, since one would end up quitting "before" the other, causing a blip
where an unnecessary (random) failover occurred. We could carefully orchestrate the order
of quitting, so the active quits last, but I think it still gets complicated.
> HA: HDFS portion of ZK-based FailoverController
> -----------------------------------------------
>                 Key: HDFS-2185
>                 URL: https://issues.apache.org/jira/browse/HDFS-2185
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: auto-failover, ha
>    Affects Versions: 0.24.0, 0.23.3
>            Reporter: Eli Collins
>            Assignee: Todd Lipcon
>             Fix For: Auto failover (HDFS-3042)
>         Attachments: Failover_Controller.jpg, hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt,
hdfs-2185.txt, hdfs-2185.txt, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf,
> This jira is for a ZK-based FailoverController daemon. The FailoverController is a separate
daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid pluggability.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message