From hdfs-issues-return-207384-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Tue Jan 9 08:48:05 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id B438D180718 for ; Tue, 9 Jan 2018 08:48:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A40D2160C3F; Tue, 9 Jan 2018 07:48:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E97F8160C13 for ; Tue, 9 Jan 2018 08:48:04 +0100 (CET) Received: (qmail 69866 invoked by uid 500); 9 Jan 2018 07:48:04 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 69855 invoked by uid 99); 9 Jan 2018 07:48:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jan 2018 07:48:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 835291A0534 for ; Tue, 9 Jan 2018 07:48:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.911 X-Spam-Level: X-Spam-Status: No, score=-99.911 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id b1LFYuxzcuhF for ; Tue, 9 Jan 2018 07:48:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 6987E5F3E2 for ; Tue, 9 Jan 2018 07:48:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id ABB42E0732 for ; Tue, 9 Jan 2018 07:48:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 62735240DA for ; Tue, 9 Jan 2018 07:48:00 +0000 (UTC) Date: Tue, 9 Jan 2018 07:48:00 +0000 (UTC) From: "Jianfei Jiang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317933#comment-16317933 ] Jianfei Jiang commented on HDFS-12935: -------------------------------------- Thanks [~brahmareddy] for your suggestions. * {{isAtLeastOneActive}} was added just for error hint and not necessary functionally. I will remove it. * Fail info will be added to sysout. * {{checkOperation(OperationCategory.READ);}} changed to {{checkOperation(OperationCategory.WRITE)}} * {{metasave}} is fixed. I think {{listOpenFiles }} is not needed to be fixed as the code following. Not all namenodes are called in {{listOpenFiles }} like other commands. {code:java} if (isHaEnabled) { ProxyAndInfo proxy = NameNodeProxies.createNonHAProxy( dfsConf, HAUtil.getAddressOfActive(getDFS()), ClientProtocol.class, UserGroupInformation.getCurrentUser(), false); openFilesRemoteIterator = new OpenFilesIterator(proxy.getProxy(), FsTracer.get(dfsConf)); } {code} * Patch for {{branch-2}} is will be made. Please review. Thanks > Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up > --------------------------------------------------------------------------------- > > Key: HDFS-12935 > URL: https://issues.apache.org/jira/browse/HDFS-12935 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools > Affects Versions: 3.0.0-beta1, 3.0.0 > Reporter: Jianfei Jiang > Assignee: Jianfei Jiang > Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS_12935.001.patch > > > In HA mode, if one namenode is down, most of functions can still work. When considering the following two occasions: > (1)nn1 up and nn2 down > (2)nn1 down and nn2 up > These two occasions should be equivalent. However, some of the DFSAdmin commands will have ambiguous results. The commands can be send successfully to the up namenode and are always functionally useful only when nn1 is up regardless of exception (IOException when connecting to the down namenode nn2). If only nn2 is up, the commands have no use at all and only exception to connect nn1 can be found. > See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to set balancer bandwidth value for datanodes as an example. It works and all the datanodes can get the setting values only when nn1 is up. If only nn2 is up, the command throws exception directly and no datanode get the bandwidth setting. Approximately ten DFSAdmin commands use the similar logical process and may be ambiguous. > [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1 > active > [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345 > *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820* > setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to jiangjianfei02:9820 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused > [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2 > active > [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234 > setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to jiangjianfei01:9820 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused > [root@jiangjianfei01 ~]# -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org