Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B605018D2A for ; Thu, 4 Jun 2015 10:06:38 +0000 (UTC) Received: (qmail 19296 invoked by uid 500); 4 Jun 2015 10:06:38 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 19239 invoked by uid 500); 4 Jun 2015 10:06:38 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 19221 invoked by uid 99); 4 Jun 2015 10:06:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2015 10:06:38 +0000 Date: Thu, 4 Jun 2015 10:06:38 +0000 (UTC) From: "surendra singh lilhore (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572488#comment-14572488 ] surendra singh lilhore commented on HDFS-8277: ---------------------------------------------- Hi [~arpitagarwal], *My Suggestion :* bq. DFSAdmin#setSafeMode should send the RPC to the ANN only. bq. SBN should fail setSafeMode via RPC call. This will be an incompatible change so for now it can just log an error and ignore the call. I think we should send the RPC to both the namenode. Currently safemode API have one parameter {{isChecked}} {code} @Override // ClientProtocol public boolean setSafeMode(SafeModeAction action, boolean isChecked) {code} If {{isChecked}} is true, RPC call will fail for SBN and it will throw StandbyException. I think we should not change anything in Namenode side {{setSafeMode()}} API. In {{DFSAdmin}} command now we are calling {{haNn.setSafeMode(action, false)}} with false, We can call it with true option for both the namenode like this. {code} { //NN in standby we will get StandbyException inSafeMode = haNn.setSafeMode(action, true); } catch (StandbyException e) { System.out.println("Skipping safemode for standby NameNode " + proxy.getAddress()); continue; } catch (IOException e) { System.out.println("Failed to connect " + proxy.getAddress()); continue; } {code} For SBN we will catch the StandbyException and print the message. After this change we will get output like this. {code} Safe mode is ON in /10.19.92.73:8020 Skipping safemode for standby NameNode /10.19.92.74:8020 {code} If my suggestion is not good then I think we need to remove {{boolean isChecked}} parameter, because no use of this parameter after adding check for SBN. > Safemode enter fails when Standby NameNode is down > -------------------------------------------------- > > Key: HDFS-8277 > URL: https://issues.apache.org/jira/browse/HDFS-8277 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, HDFS, namenode > Affects Versions: 2.6.0 > Environment: HDP 2.2.0 > Reporter: Hari Sekhon > Assignee: surendra singh lilhore > Priority: Minor > Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch > > > HDFS fails to enter safemode when the Standby NameNode is down (eg. due to AMBARI-10536). > {code}hdfs dfsadmin -safemode enter > safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused{code} > This appears to be a bug in that it's not trying both NameNodes like the standard hdfs client code does, and is instead stopping after getting a connection refused from nn1 which is down. I verified normal hadoop fs writes and reads via cli did work at this time, using nn2. I happened to run this command as the hdfs user on nn2 which was the surviving Active NameNode. > After I re-bootstrapped the Standby NN to fix it the command worked as expected again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)