Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B108909D for ; Mon, 9 Apr 2012 23:45:38 +0000 (UTC) Received: (qmail 59700 invoked by uid 500); 9 Apr 2012 23:45:37 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 59669 invoked by uid 500); 9 Apr 2012 23:45:37 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 59661 invoked by uid 99); 9 Apr 2012 23:45:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Apr 2012 23:45:37 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Apr 2012 23:45:35 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id DD1EE3625C0 for ; Mon, 9 Apr 2012 23:45:15 +0000 (UTC) Date: Mon, 9 Apr 2012 23:45:15 +0000 (UTC) From: "Todd Lipcon (Commented) (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <403401895.5413.1334015115907.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2024739814.15210.1333587322121.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-8247) Auto-HA: add a config to enable auto-HA, which disables manual FC MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250307#comment-13250307 ] Todd Lipcon commented on HADOOP-8247: ------------------------------------- bq. There are always admins who disregard these warnings I think they deserve what they get... admins can also decide to run "rm -Rf /my/metadata/dir" and get into a bad state. bq. Instead, wouldn't it be better to come up with a set of procedures to unwedge the cluster, starting with setting auto-failover key to false, resetting NNs and using manual failover Assumedly you want to be able to do this without incurring downtime. Certainly if downtime is acceptable, that would be the right response.. But still I think having a manual override here is useful for advanced operators who need to use it in an extenuating circumstance. As I said above, I'm OK giving it a scarier name and/or making it prompt for confirmation upon use, with a scary warning message. I'm even OK removing it from the documentation, so people aren't lured into using it when they don't really know what they're doing. > Auto-HA: add a config to enable auto-HA, which disables manual FC > ----------------------------------------------------------------- > > Key: HADOOP-8247 > URL: https://issues.apache.org/jira/browse/HADOOP-8247 > Project: Hadoop Common > Issue Type: Improvement > Components: auto-failover, ha > Affects Versions: Auto Failover (HDFS-3042) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hadoop-8247.txt, hadoop-8247.txt, hadoop-8247.txt, hadoop-8247.txt > > > Currently, if automatic failover is set up and running, and the user uses the "haadmin -failover" command, he or she can end up putting the system in an inconsistent state, where the state in ZK disagrees with the actual state of the world. To fix this, we should add a config flag which is used to enable auto-HA. When this flag is set, we should disallow use of the haadmin command to initiate failovers. We should refuse to run ZKFCs when the flag is not set. Of course, this flag should be scoped by nameservice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira