From hdfs-issues-return-273324-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Tue Jul 23 09:45:03 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 604A61802C7 for ; Tue, 23 Jul 2019 11:45:03 +0200 (CEST) Received: (qmail 65893 invoked by uid 500); 23 Jul 2019 09:45:02 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 65803 invoked by uid 99); 23 Jul 2019 09:45:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jul 2019 09:45:02 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EC6B5E2F2E for ; Tue, 23 Jul 2019 09:45:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5B9E5265D0 for ; Tue, 23 Jul 2019 09:45:00 +0000 (UTC) Date: Tue, 23 Jul 2019 09:45:00 +0000 (UTC) From: "Chen Zhang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-12820?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1689= 0847#comment-16890847 ]=20 Chen Zhang edited comment on HDFS-12820 at 7/23/19 9:44 AM: ------------------------------------------------------------ Hi [~jojochuang], I've checked the code of the trunk branch, I think this i= ssue still exists on the latest version If we decommission a datanode and then stop it, the *nodesInService* of Dat= anodeStats variable is not subtracted, see the follow code: {code:java} synchronized void subtract(final DatanodeDescriptor node) { xceiverCount -=3D node.getXceiverCount(); if (node.isInService()) { //Admin.DECOMMISSIONED is not count as isInServ= ice capacityUsed -=3D node.getDfsUsed(); capacityUsedNonDfs -=3D node.getNonDfsUsed(); blockPoolUsed -=3D node.getBlockPoolUsed(); nodesInService--; nodesInServiceXceiverCount -=3D node.getXceiverCount(); capacityTotal -=3D node.getCapacity(); capacityRemaining -=3D node.getRemaining(); cacheCapacity -=3D node.getCacheCapacity(); cacheUsed -=3D node.getCacheUsed(); } else if (node.isDecommissionInProgress() || node.isEnteringMaintenance()) { cacheCapacity -=3D node.getCacheCapacity(); cacheUsed -=3D node.getCacheUsed(); } ... }{code} so If we have a cluster of 100 nodes and we decommission and stopped 50 nod= es, the *nodeInService* variable will still be 100, this would makes the va= lue stats.getInServiceXceiverAverage returns is only half of real "average = xceiver count", which will cause most nodes become overloaded in the follow= ing code {code:java} boolean excludeNodeByLoad(DatanodeDescriptor node){ final double maxLoad =3D considerLoadFactor * stats.getInServiceXceiverAverage(); //calculated by total-xceiverCount/no= desInService final int nodeLoad =3D node.getXceiverCount(); if ((nodeLoad > maxLoad) && (maxLoad > 0)) { logNodeIsNotChosen(node, NodeNotChosenReason.NODE_TOO_BUSY, "(load: " + nodeLoad + " > " + maxLoad + ")"); return true; } return false; } {code} =C2=A0 was (Author: zhangchen): Hi [~jojochuang], I've checked the code of the trunk branch, I think this i= ssue still exists on the latest version If we decommission a datanode and then stop it, the nodesInService of Datan= odeStats variable is not subtracted, see the follow code: =C2=A0 {code:java} synchronized void subtract(final DatanodeDescriptor node) { xceiverCount -=3D node.getXceiverCount(); if (node.isInService()) { //Admin.DECOMMISSIONED is not count as isInServ= ice capacityUsed -=3D node.getDfsUsed(); capacityUsedNonDfs -=3D node.getNonDfsUsed(); blockPoolUsed -=3D node.getBlockPoolUsed(); nodesInService--; nodesInServiceXceiverCount -=3D node.getXceiverCount(); capacityTotal -=3D node.getCapacity(); capacityRemaining -=3D node.getRemaining(); cacheCapacity -=3D node.getCacheCapacity(); cacheUsed -=3D node.getCacheUsed(); } else if (node.isDecommissionInProgress() || node.isEnteringMaintenance()) { cacheCapacity -=3D node.getCacheCapacity(); cacheUsed -=3D node.getCacheUsed(); } ... }{code} so If we have a cluster of 100 nodes and we decommission and stopped 50 nod= es, the nodeInService variable will still be 100, this would makes the valu= e stats.getInServiceXceiverAverage returns is only half of real "average xc= eiver count", which will cause most nodes become overloaded in the followin= g code {code:java} boolean excludeNodeByLoad(DatanodeDescriptor node){ final double maxLoad =3D considerLoadFactor * stats.getInServiceXceiverAverage(); //calculated by total-xceiverCount/no= desInService final int nodeLoad =3D node.getXceiverCount(); if ((nodeLoad > maxLoad) && (maxLoad > 0)) { logNodeIsNotChosen(node, NodeNotChosenReason.NODE_TOO_BUSY, "(load: " + nodeLoad + " > " + maxLoad + ")"); return true; } return false; } {code} =C2=A0 > Decommissioned datanode is counted in service cause datanode allcating fa= ilure > -------------------------------------------------------------------------= ----- > > Key: HDFS-12820 > URL: https://issues.apache.org/jira/browse/HDFS-12820 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement > Affects Versions: 2.4.0 > Reporter: Gang Xie > Priority: Major > > When allocate a datanode when dfsclient write with considering the load, = it checks if the datanode is overloaded by calculating the average xceivers= of all the in service datanode. But if the datanode is decommissioned and = become dead, it's still treated as in service, which make the average load = much more than the real one especially when the number of the decommissione= d datanode is great. In our cluster, 180 datanode, and 100 of them decommis= sioned, and the average load is 17. This failed all the datanode allocation= .=20 > private void subtract(final DatanodeDescriptor node) { > capacityUsed -=3D node.getDfsUsed(); > blockPoolUsed -=3D node.getBlockPoolUsed(); > xceiverCount -=3D node.getXceiverCount(); > {color:red} if (!(node.isDecommissionInProgress() || node.isDecommis= sioned())) {{color} > nodesInService--; > nodesInServiceXceiverCount -=3D node.getXceiverCount(); > capacityTotal -=3D node.getCapacity(); > capacityRemaining -=3D node.getRemaining(); > } else { > capacityTotal -=3D node.getDfsUsed(); > } > cacheCapacity -=3D node.getCacheCapacity(); > cacheUsed -=3D node.getCacheUsed(); > } -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org