From hdfs-issues-return-267685-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Fri Jun 14 16:54:07 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id D093F18067E for ; Fri, 14 Jun 2019 18:54:06 +0200 (CEST) Received: (qmail 34255 invoked by uid 500); 14 Jun 2019 16:54:02 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 34169 invoked by uid 99); 14 Jun 2019 16:54:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jun 2019 16:54:02 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1B0F2E2D98 for ; Fri, 14 Jun 2019 16:54:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 711AD2464F for ; Fri, 14 Jun 2019 16:54:00 +0000 (UTC) Date: Fri, 14 Jun 2019 16:54:00 +0000 (UTC) From: "Stephen O'Donnell (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-14563) Enhance interface about recommissioning/decommissioning MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864264#comment-16864264 ] Stephen O'Donnell commented on HDFS-14563: ------------------------------------------ Looking at the source, it seems it would be feasible to take the DNS lookup out of the write lock. Right now, it calls: {code} public void refreshNodes(final Configuration conf) throws IOException { refreshHostsReader(conf); namesystem.writeLock(); try { refreshDatanodes(); countSoftwareVersions(); } finally { namesystem.writeUnlock(); } } {code} The line refreshHostsReader(conf); reads the new config file and does a DNS lookup on each entry - the write lock is not held here. Then the main work is done here: {code} private void refreshDatanodes() { final Map copy; synchronized (this) { copy = new HashMap<>(datanodeMap); } for (DatanodeDescriptor node : copy.values()) { // Check if not include. if (!hostConfigManager.isIncluded(node)) { node.setDisallowed(true); } else { long maintenanceExpireTimeInMS = hostConfigManager.getMaintenanceExpirationTimeInMS(node); if (node.maintenanceNotExpired(maintenanceExpireTimeInMS)) { datanodeAdminManager.startMaintenance( node, maintenanceExpireTimeInMS); } else if (hostConfigManager.isExcluded(node)) { datanodeAdminManager.startDecommission(node); } else { datanodeAdminManager.stopMaintenance(node); datanodeAdminManager.stopDecommission(node); } } node.setUpgradeDomain(hostConfigManager.getUpgradeDomain(node)); } } {code} All the isIncluded(), isExcluded() methods call node.getResolvedAddress() which does the DNS lookup. We could probably change things to perform all the DNS lookups outside of the write lock, and then take the lock and process the nodes. Also change or overload isIncluded() etc to take the inetAddress rather than the datanode descriptor. It would not shorten the time the operation takes to run overall, but it would move the long duration out of the write lock and avoid blocking the namenode for the entire time. > Enhance interface about recommissioning/decommissioning > ------------------------------------------------------- > > Key: HDFS-14563 > URL: https://issues.apache.org/jira/browse/HDFS-14563 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, namenode > Reporter: He Xiaoqiao > Assignee: He Xiaoqiao > Priority: Major > > In current implementation, if we need to decommissioning or recommissioning one datanode, the only way is add the datanode to include or exclude file under namenode configuration path then execute command `bin/hadoop dfsadmin -refreshNodes` and trigger namenode to reload include/exclude and start to recommissioning or decommissioning datanode. > The shortcomings of this approach is that: > a. namenode reload include/exclude configuration file from devices, if I/O load is high, handler may be blocked. > b. namenode has to process every datnodes in include and exclude configurations, if there are many datanodes (very common for large cluster) pending to process, namenode will be hung for hundred seconds to wait recommision/decommision finish at the worst since holding write lock. > I think we should expose one lightweight interface to support recommissioning or decommissioning single datanode, thus we can operate datanode using dfsadmin more smooth. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org