hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController
Date Wed, 04 Apr 2012 23:00:23 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246819#comment-13246819

Todd Lipcon commented on HDFS-2185:

bq. I'm not quite sure how it can be guaranteed. NN cannot be aware of who issues a transition,

My plan was to add an enum flag to the RPCs like {{transitionToActive}} and {{transitionToStandby}}
that would indicate who sent it. For example "CLI_FAILOVER", "ZKFC_FAILOVER", or "FORCE".
The force option would be there so that if the admin *really* knows what he/she is doing,
they could override the safety check. Otherwise the haadmin commands can prevent users from
accidentally shooting themselves in the foot.

bq. I still think it makes sense to ops to have an option to turn on/off auto failover on-demand.
In case of ZKFC issues, we still can have an alternative way to bypass it. However I'm neither
sure it would help ops or confuse them.

Thats a good point - it's useful for emergency situations. I think we can solve this with
docs, though -- if you want to stop automatic failovers, you need to first shut down the standby
ZKFCs, then the active ZKFC. If you bring them down in the other order, it won't break things,
but you might get a failover in the process. I think adding a programatic way to do this is
a future improvement.
> HA: HDFS portion of ZK-based FailoverController
> -----------------------------------------------
>                 Key: HDFS-2185
>                 URL: https://issues.apache.org/jira/browse/HDFS-2185
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: auto-failover, ha
>    Affects Versions: 0.24.0, 0.23.3
>            Reporter: Eli Collins
>            Assignee: Todd Lipcon
>             Fix For: Auto failover (HDFS-3042)
>         Attachments: Failover_Controller.jpg, hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt,
hdfs-2185.txt, hdfs-2185.txt, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf,
> This jira is for a ZK-based FailoverController daemon. The FailoverController is a separate
daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid pluggability.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message