Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0422617F56 for ; Tue, 31 Mar 2015 12:15:11 +0000 (UTC) Received: (qmail 72749 invoked by uid 500); 31 Mar 2015 12:14:58 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 72694 invoked by uid 500); 31 Mar 2015 12:14:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 72682 invoked by uid 99); 31 Mar 2015 12:14:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2015 12:14:58 +0000 Date: Tue, 31 Mar 2015 12:14:58 +0000 (UTC) From: "Hudson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-7742) favoring decommissioning node for replication can cause a block to stay underreplicated for long periods MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388432#comment-14388432 ] Hudson commented on HDFS-7742: ------------------------------ SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/883/]) HDFS-7742. Favoring decommissioning node for replication can cause a block to stay (kihwal: rev 04ee18ed48ceef34598f954ff40940abc9fde1d2) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > favoring decommissioning node for replication can cause a block to stay underreplicated for long periods > -------------------------------------------------------------------------------------------------------- > > Key: HDFS-7742 > URL: https://issues.apache.org/jira/browse/HDFS-7742 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.0 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Fix For: 2.7.0 > > Attachments: HDFS-7742-v0.patch > > > When choosing a source node to replicate a block from, a decommissioning node is favored. The reason for the favoritism is that decommissioning nodes aren't servicing any writes so in-theory they are less loaded. > However, the same selection algorithm also tries to make sure it doesn't get "stuck" on any particular node: > {noformat} > // switch to a different node randomly > // this to prevent from deterministically selecting the same node even > // if the node failed to replicate the block on previous iterations > {noformat} > Unfortunately, the decommissioning check is prior to this randomness so the algorithm can get stuck trying to replicate from a decommissioning node. We've seen this in practice where a decommissioning datanode was failing to replicate a block for many days, when other viable replicas of the block were available. > Given that we limit the number of streams we'll assign to a given node (default soft limit of 2, hard limit of 4), It doesn't seem like favoring a decommissioning node has significant benefit. i.e. when there is significant replication work to do, we'll quickly hit the stream limit of the decommissioning nodes and use other nodes in the cluster anyway; when there isn't significant replication work then in theory we've got plenty of replication bandwidth available so choosing a decommissioning node isn't much of a win. > I see two choices: > 1) Change the algorithm to still favor decommissioning nodes but with some level of randomness that will avoid always selecting the decommissioning node > 2) Remove the favoritism for decommissioning nodes > I prefer #2. It simplifies the algorithm, and given the other throttles we have in place, I'm not sure there is a significant benefit to selecting decommissioning nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)