hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8247) Auto-HA: add a config to enable auto-HA, which disables manual FC
Date Mon, 09 Apr 2012 23:29:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250294#comment-13250294
] 

Todd Lipcon commented on HADOOP-8247:
-------------------------------------

I also ran the manual tests again. Here's the usage output of HAAdmin:

{code}
Usage: DFSHAAdmin [-ns <nameserviceId>]
    [-transitionToActive [--forcemanual] <serviceId>]
    [-transitionToStandby [--forcemanual] <serviceId>]
    [-failover [--forcefence] [--forceactive] [--forcemanual] <serviceId> <serviceId>]
    [-getServiceState <serviceId>]
    [-checkHealth <serviceId>]
    [-help <command>]

  --forceManual allows the manual failover commands to be used
                even when automatic failover is enabled. This
                flag is DANGEROUS and should only be used with
                expert guidance.
{code}

Here's what happens if I try to use a state change command with auto-HA enabled:

{code}
$ ./bin/hdfs haadmin -transitionToActive nn1
Automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please 
specify the forcemanual flag.
$ echo $?
255
{code}

Also checked the other two state-changing ops (transitionToStandby and failover) and they
yielded the same error message.


- I verified that {{-getServiceState}} and {{-checkHealth}} continue to work.

- I verified that the -forceManual flag worked:

{code}
$ ./bin/hdfs haadmin -transitionToStandby -forcemanual nn1
12/04/09 16:12:38 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021
{code}
(also for -transitionToActive and -failover)

- Verified that {{start-dfs.sh}} starts the ZKFCs on both of my configured NNs when auto-HA
is enabled. Also verified {{stop-dfs.sh}} stops the ZKFCs. Discovered trivial bug HDFS-3234
here.

----

Next, I modified my config to set the auto failover flag to false.

- verified that start-dfs.sh doesn't try to start ZKFCs.
- verified that if I try to start a ZKFC, it bails:
{code}
12/04/09 16:19:12 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode
nameserviceId1.nn2
12/04/09 16:19:12 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode
at todd-w510/127.0.0.1:8022. Please ensure that automatic failover is enabled in the configuration
before running the ZK failover controller.
{code}

- verified that the haadmin commands all function without any {{-forcemanual}} flag specified.

                
> Auto-HA: add a config to enable auto-HA, which disables manual FC
> -----------------------------------------------------------------
>
>                 Key: HADOOP-8247
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8247
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: auto-failover, ha
>    Affects Versions: Auto Failover (HDFS-3042)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8247.txt, hadoop-8247.txt, hadoop-8247.txt, hadoop-8247.txt
>
>
> Currently, if automatic failover is set up and running, and the user uses the "haadmin
-failover" command, he or she can end up putting the system in an inconsistent state, where
the state in ZK disagrees with the actual state of the world. To fix this, we should add a
config flag which is used to enable auto-HA. When this flag is set, we should disallow use
of the haadmin command to initiate failovers. We should refuse to run ZKFCs when the flag
is not set. Of course, this flag should be scoped by nameservice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message