Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71ABD9F76 for ; Thu, 22 Mar 2012 23:20:51 +0000 (UTC) Received: (qmail 30083 invoked by uid 500); 22 Mar 2012 23:20:51 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 30045 invoked by uid 500); 22 Mar 2012 23:20:51 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 30036 invoked by uid 99); 22 Mar 2012 23:20:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2012 23:20:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2012 23:20:46 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3A6BD34198B for ; Thu, 22 Mar 2012 23:20:26 +0000 (UTC) Date: Thu, 22 Mar 2012 23:20:26 +0000 (UTC) From: "Todd Lipcon (Updated) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1229592118.5487.1332458426397.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1602658885.41956.1331257556973.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HDFS-3071) haadmin failover command does not provide enough detail for when target NN is not ready to be active MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3071: ------------------------------ Attachment: hdfs-3071.txt Found one more issue in manual testing, which made me go back and add automated tests for this feature. I fixed TestDFSHAAdminMiniCluster to actually record the error output, and added an assertion to check that it's correct for the safemode case. Also tested locally. I ran all the HA tests in both common and HDFS as well. > haadmin failover command does not provide enough detail for when target NN is not ready to be active > ---------------------------------------------------------------------------------------------------- > > Key: HDFS-3071 > URL: https://issues.apache.org/jira/browse/HDFS-3071 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha > Affects Versions: 0.24.0 > Reporter: Philip Zeyliger > Assignee: Todd Lipcon > Attachments: hdfs-3071.txt, hdfs-3071.txt, hdfs-3071.txt, hdfs-3071.txt, hdfs-3071.txt, hdfs-3071.txt > > > When running the failover command, you can get an error message like the following: > {quote} > $ hdfs --config $(pwd) haadmin -failover namenode2 namenode1 > Failover failed: xxx.yyy/1.2.3.4:8020 is not ready to become active > {quote} > Unfortunately, the error message doesn't describe why that node isn't ready to be active. In my case, the target namenode's logs don't indicate anything either. It turned out that the issue was "Safe mode is ON.Resources are low on NN. Safe mode must be turned off manually.", but ideally the user would be told that at the time of the failover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira