hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8279) Auto-HA: Allow manual failover to be invoked from zkfc.
Date Tue, 01 May 2012 18:36:52 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265958#comment-13265958

Aaron T. Myers commented on HADOOP-8279:

Patch looks pretty good to me, Todd. A few little comments:

# "-forceFence doesn't seem to have any real use cases with auto-HA so it isn't implemented."
- I don't follow the reasoning. Seems like it should be just as applicable to auto-HA as manual,
# "If the attempt to transition to standby succeeds, then the ZKFC will delete the breadcrumb
node in ZooKeeper" - might want to specify which ZKFC will do the deletion.
# "If the node is healthy and not active, it sends an RPC to the current active, asking it
to yield from the election." - it actually sends an RPC to the ZKFC associated with the current
# "if the current active does not respond to the graceful request, throws an exception indicating
the reason for failure." - I recommend you make it explicit which graceful request this is
referring to. In fact, if the active NN fails to respond to the graceful request to transition
to standby, it will be fenced. It's the failure of the active ZKFC to respond to the cedeActive
calls that results in a failure of gracefulFailover.
# I think you need interface annotations on ZKFCRpcServer, or perhaps it can be made package-private?
# In ZKFCProtocol#cedeActive you declare the parameter to be in millis, but in the ZKFCRpcServer#cedeActive
implementation, you say the period is in seconds.
# I don't see much point in having both ZKFCRpcServer#stop and ZKFCRpcServer#join. Why not
just call this.server.join in ZKFCRpcServer#stop?
# "periodically check health state since, because entering an" - doesn't quite parse.
# I think the log message about the timeout elapsing in ZKFailoverController#waitForActiveAttempt
should probably be at least at WARN level instead of INFO.
# "It's possible that it's in standby but just about to go into active, no? Is there some
race here?" - should this comment now be removed?
# I recommend you change the value of DFS_HA_ZKFC_PORT_DEFAULT to something other than 8021.
I've seen a lot of JTs in the wild with their default port set to 8021.
# The design in the document posted to HDFS-2185 mentions introducing "-to" and "-from" parameters
to the `haadmin -failover' command, but this implementation doesn't do that. That seems fine
by me, but I'm curious why you chose to do it this way.
> Auto-HA: Allow manual failover to be invoked from zkfc.
> -------------------------------------------------------
>                 Key: HADOOP-8279
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8279
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: auto-failover, ha
>    Affects Versions: Auto Failover (HDFS-3042)
>            Reporter: Mingjie Lai
>            Assignee: Todd Lipcon
>             Fix For: Auto Failover (HDFS-3042)
>         Attachments: hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt
> HADOOP-8247 introduces a configure flag to prevent potential status inconsistency between
zkfc and namenode, by making auto and manual failover mutually exclusive.
> However, as described in 2.7.2 section of design doc at HDFS-2185, we should allow manual
and auto failover co-exist, by:
> - adding some rpc interfaces at zkfc
> - manual failover shall be triggered by haadmin, and handled by zkfc if auto failover
is enabled. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message