Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC42AE618 for ; Fri, 4 Jan 2013 10:34:26 +0000 (UTC) Received: (qmail 37975 invoked by uid 500); 4 Jan 2013 10:34:26 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 37912 invoked by uid 500); 4 Jan 2013 10:34:26 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 37899 invoked by uid 99); 4 Jan 2013 10:34:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jan 2013 10:34:26 +0000 Date: Fri, 4 Jan 2013 10:34:26 +0000 (UTC) From: "Hudson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4270) Replications of the highest priority should be allowed to choose a source datanode that has reached its max replication limit MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543771#comment-13543771 ] Hudson commented on HDFS-4270: ------------------------------ Integrated in Hadoop-Yarn-trunk #86 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/86/]) HDFS-4270. Introduce soft and hard limits for max replication so that replications of the highest priority are allowed to choose a source datanode that has reached its soft limit but not the hard limit. Contributed by Derek Dagit (Revision 1428739) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428739 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java > Replications of the highest priority should be allowed to choose a source datanode that has reached its max replication limit > ----------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-4270 > URL: https://issues.apache.org/jira/browse/HDFS-4270 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.0.0, 0.23.5 > Reporter: Derek Dagit > Assignee: Derek Dagit > Priority: Minor > Fix For: 3.0.0, 2.0.3-alpha > > Attachments: HDFS-4270-branch-0.23.patch, HDFS-4270-branch-0.23.patch, HDFS-4270.patch, HDFS-4270.patch, HDFS-4270.patch, HDFS-4270.patch > > > Blocks that have been identified as under-replicated are placed on one of several priority queues. The highest priority queue is essentially reserved for situations in which only one replica of the block exists, meaning it should be replicated ASAP. > The ReplicationMonitor periodically computes replication work, and a call to BlockManager#chooseUnderReplicatedBlocks selects a given number of under-replicated blocks, choosing blocks from the highest-priority queue first and working down to the lowest priority queue. > In the subsequent call to BlockManager#computeReplicationWorkForBlocks, a source for the replication is chosen from among datanodes that have an available copy of the block needed. This is done in BlockManager#chooseSourceDatanode. > chooseSourceDatanode's job is to choose the datanode for replication. It chooses a random datanode from the available datanodes that has not reached its replication limit (preferring datanodes that are currently decommissioning). > However, the priority queue of the block does not inform the logic. If a datanode holds the last remaining replica of a block and has already reached its replication limit, the node is dismissed outright and the replication is not scheduled. > In some situations, this could lead to data loss, as the last remaining replica could disappear if an opportunity is not taken to schedule a replication. It would be better to waive the max replication limit in cases of highest-priority block replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira