Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 81767 invoked from network); 10 Sep 2008 21:55:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Sep 2008 21:55:37 -0000 Received: (qmail 79068 invoked by uid 500); 10 Sep 2008 21:55:33 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 78746 invoked by uid 500); 10 Sep 2008 21:55:32 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 78735 invoked by uid 99); 10 Sep 2008 21:55:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2008 14:55:32 -0700 X-ASF-Spam-Status: No, hits=-1998.8 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2008 21:54:42 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 70025234C1DC for ; Wed, 10 Sep 2008 14:54:44 -0700 (PDT) Message-ID: <1840921850.1221083684458.JavaMail.jira@brutus> Date: Wed, 10 Sep 2008 14:54:44 -0700 (PDT) From: "Konstantin Shvachko (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4071) FSNameSystem.isReplicationInProgress should add an underReplicated block to the neededReplication queue using method "add" not "update" In-Reply-To: <375154308.1220562944385.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629989#action_12629989 ] Konstantin Shvachko commented on HADOOP-4071: --------------------------------------------- +1 > FSNameSystem.isReplicationInProgress should add an underReplicated block to the neededReplication queue using method "add" not "update" > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-4071 > URL: https://issues.apache.org/jira/browse/HADOOP-4071 > Project: Hadoop Core > Issue Type: Bug > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Fix For: 0.19.0 > > Attachments: decommission.patch > > > We have a datanode that did not get decommission done for days. It turned out that there was an under replicated block that was never placed in the neededReplication queue and therefore did not get replicated. The following debug line showed the problem: > " DEBUG org.apache.hadoop.dfs.StateChange: UnderReplicationBlocks.update blk_-7437651423871278837_0 curReplicas 8 > curExpectedReplicas 10 oldReplicas 9 oldExpectedReplicas 10 curPri 2 oldPri 2" > The block was not in the neededReplication queue, but the update method concluded that the block was under replicated and the priority level did not change, so it did not add the block to the needReplication queue. > The solution is that in stead of using the update method, the name node should use the add method to add the block to the neededReplication queue. The add method guarantees success if the block is indeed under replicated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.