hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8279) Auto-HA: Allow manual failover to be invoked from zkfc.
Date Wed, 02 May 2012 05:34:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266357#comment-13266357

Todd Lipcon commented on HADOOP-8279:

bq. "-forceFence doesn't seem to have any real use cases with auto-HA so it isn't implemented."
- I don't follow the reasoning. Seems like it should be just as applicable to auto-HA as manual,

I chatted with Eli about this, since he's the one who originally added the "-forceFence" option.
The original motivation was to test the fencing script, but with this "manual failover" that's
probably not the best way to test it. Better would be to do something like kill -STOP the
active NN, which will both trigger a failover and trigger fencing. Another option might be
to create a new command like "-testFencer" which would (after requiring confirmation) shoot
down the active. But since it's a corner case let's address as a follow-up improvement.

bq. "If the attempt to transition to standby succeeds, then the ZKFC will delete the breadcrumb
node in ZooKeeper" - might want to specify which ZKFC will do the deletion.

Changed to:
   * If the attempt to transition to standby succeeds, then the ZKFC receiving
   * this RPC will delete its own breadcrumb node in ZooKeeper. Thus, the
   * next node to become active will not run any fencing process. Otherwise,
   * the breadcrumb will be left, such that the next active will fence this
   * node.

bq. "If the node is healthy and not active, it sends an RPC to the current active, asking
it to yield from the election." - it actually sends an RPC to the ZKFC associated with the
current active.

I actually removed the details here in ZKFCProtocol.java, electing instead to refer the reader
to the implementation. I think it's better for the ZKFCProtocol javadocs to explain the "outward"
behavior, and explain the actual implementation in the design doc and the inline comments
in ZKFailoverController. It now reads:

   * If the node is healthy and not active, it will try to initiate a graceful
   * failover to become active, returning only when it has successfully become
   * active. See {@link ZKFailoverController#gracefulFailoverToYou()} for the
   * implementation details.

bq. "if the current active does not respond to the graceful request, throws an exception indicating
the reason for failure." - I recommend you make it explicit which graceful request this is
referring to. In fact, if the active NN fails to respond to the graceful request to transition
to standby, it will be fenced. It's the failure of the active ZKFC to respond to the cedeActive
calls that results in a failure of gracefulFailover.

Per above, I changed this to only reference what a caller needs to know, instead of the underlying
   * If the node fails to successfully coordinate the failover, throws an
   * exception indicating the reason for failure.

bq. I think you need interface annotations on ZKFCRpcServer, or perhaps it can be made package-private?
Good catch. It can't be package-private because DFSZKFailoverController is in an HDFS package.
I annotated it LimitedPrivate to HDFS.

bq. In ZKFCProtocol#cedeActive you declare the parameter to be in millis, but in the ZKFCRpcServer#cedeActive
implementation, you say the period is in seconds.
Another good catch - I changed this late in the development of the patch and missed a spot.

bq. I don't see much point in having both ZKFCRpcServer#stop and ZKFCRpcServer#join. Why not
just call this.server.join in ZKFCRpcServer#stop?

Combined the two into a {{stopAndJoin}}

bq. "periodically check health state since, because entering an" - doesn't quite parse.


bq. I think the log message about the timeout elapsing in ZKFailoverController#waitForActiveAttempt
should probably be at least at WARN level instead of INFO.

bq. "It's possible that it's in standby but just about to go into active, no? Is there some
race here?" - should this comment now be removed?

This comment is basically about the situation described in HADOOP-8217, so it's still relevant.

bq. I recommend you change the value of DFS_HA_ZKFC_PORT_DEFAULT to something other than 8021.
I've seen a lot of JTs in the wild with their default port set to 8021.

Good point... I changed it to 8019.

bq. The design in the document posted to HDFS-2185 mentions introducing "-to" and "-from"
parameters to the `haadmin -failover' command, but this implementation doesn't do that. That
seems fine by me, but I'm curious why you chose to do it this way.

I ended up not changing it just to keep the syntax consistent with what we've already got
and avoid making this patch even longer. Let's discuss in a followup JIRA if we want to change
the syntax for this command.

> Auto-HA: Allow manual failover to be invoked from zkfc.
> -------------------------------------------------------
>                 Key: HADOOP-8279
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8279
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: auto-failover, ha
>    Affects Versions: Auto Failover (HDFS-3042)
>            Reporter: Mingjie Lai
>            Assignee: Todd Lipcon
>             Fix For: Auto Failover (HDFS-3042)
>         Attachments: hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt,
> HADOOP-8247 introduces a configure flag to prevent potential status inconsistency between
zkfc and namenode, by making auto and manual failover mutually exclusive.
> However, as described in 2.7.2 section of design doc at HDFS-2185, we should allow manual
and auto failover co-exist, by:
> - adding some rpc interfaces at zkfc
> - manual failover shall be triggered by haadmin, and handled by zkfc if auto failover
is enabled. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message