Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 319F7200D44 for ; Mon, 20 Nov 2017 10:11:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 300E0160C0D; Mon, 20 Nov 2017 09:11:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 79321160BEC for ; Mon, 20 Nov 2017 10:11:06 +0100 (CET) Received: (qmail 1039 invoked by uid 500); 20 Nov 2017 09:11:05 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 1016 invoked by uid 99); 20 Nov 2017 09:11:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Nov 2017 09:11:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 565681A025A for ; Mon, 20 Nov 2017 09:11:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id yQ3PrMRNOBhn for ; Mon, 20 Nov 2017 09:11:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1BEAB5F569 for ; Mon, 20 Nov 2017 09:11:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 3A371E095A for ; Mon, 20 Nov 2017 09:11:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 285D023F05 for ; Mon, 20 Nov 2017 09:11:00 +0000 (UTC) Date: Mon, 20 Nov 2017 09:11:00 +0000 (UTC) From: "Gang Xie (JIRA)" To: hdfs-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Reopened] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 20 Nov 2017 09:11:07 -0000 [ https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Xie reopened HDFS-12820: ----------------------------- After carefully check the issue reported in HDFS-9279, and found this issue is not a its dup. this case is mainly about the param {color:#d04437}nodesInService{color}. When a datanode is decommissioned and then dead. nodesInService will not be subtracted. Then when allocation, the dead node will be counted in the maxload, which makes the maxload very low, in turns, causes any allocation failing. if (considerLoad) { {color:#d04437} final double maxLoad = maxLoadRatio * stats.getInServiceXceiverAverage();{color} final int nodeLoad = node.getXceiverCount(); if (nodeLoad > maxLoad) { logNodeIsNotChosen(storage, "the node is too busy (load:"+nodeLoad+" > "+maxLoad+") "); stats.incrOverLoaded(); return false; } } > Decommissioned datanode is counted in service cause datanode allcating failure > ------------------------------------------------------------------------------ > > Key: HDFS-12820 > URL: https://issues.apache.org/jira/browse/HDFS-12820 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement > Affects Versions: 2.4.0 > Reporter: Gang Xie > > When allocate a datanode when dfsclient write with considering the load, it checks if the datanode is overloaded by calculating the average xceivers of all the in service datanode. But if the datanode is decommissioned and become dead, it's still treated as in service, which make the average load much more than the real one especially when the number of the decommissioned datanode is great. In our cluster, 180 datanode, and 100 of them decommissioned, and the average load is 17. This failed all the datanode allocation. > private void subtract(final DatanodeDescriptor node) { > capacityUsed -= node.getDfsUsed(); > blockPoolUsed -= node.getBlockPoolUsed(); > xceiverCount -= node.getXceiverCount(); > {color:red} if (!(node.isDecommissionInProgress() || node.isDecommissioned())) {{color} > nodesInService--; > nodesInServiceXceiverCount -= node.getXceiverCount(); > capacityTotal -= node.getCapacity(); > capacityRemaining -= node.getRemaining(); > } else { > capacityTotal -= node.getDfsUsed(); > } > cacheCapacity -= node.getCacheCapacity(); > cacheUsed -= node.getCacheUsed(); > } -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org