Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 68252 invoked from network); 31 Oct 2008 18:42:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Oct 2008 18:42:09 -0000 Received: (qmail 60179 invoked by uid 500); 31 Oct 2008 18:42:12 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 60150 invoked by uid 500); 31 Oct 2008 18:42:12 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 60139 invoked by uid 99); 31 Oct 2008 18:42:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2008 11:42:12 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2008 18:41:05 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 757D6234C25A for ; Fri, 31 Oct 2008 11:41:46 -0700 (PDT) Message-ID: <1834770839.1225478506480.JavaMail.jira@brutus> Date: Fri, 31 Oct 2008 11:41:46 -0700 (PDT) From: "Raghu Angadi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Issue Comment Edited: (HADOOP-4540) An invalidated block should be removed from the blockMap In-Reply-To: <2009888801.1225317764190.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644391#action_12644391 ] rangadi edited comment on HADOOP-4540 at 10/31/08 11:40 AM: ----------------------------------------------------------------- (Edit : corrected the jira number refererred.) I think this was the policy the case even in pre-0.17.0 NameNode i.e. Blocks were deleted only lazily from blocksMap. Whether HADOOP-4556 has always been there or made more probably by another policy I am not sure. bq. My proposal is to remove a replica from the blocks map when it is marked as "invalid" (i.e., when it is moved to the recentInvalidateSet) as a result of over-replication. Also when a block report comes in, and a new replica is found but it is marked as invalid, this new replica does not get added to the blocks map. This probably needs more details. We have so many maps : blocksMap, neededReplications, excessReplications etc. These are all supposed to be consistent in some way. What the consistency requirements are or how the requirements are enforced in not explicitly defined anywhere. I am afraid if we make one isolated change now, it is very hard say for sure that we are not introducing issues similar to HADOOP-4556. We could probably do something smaller to avoid HADOOP-4556. But to change a policy that been there since the beginning as this jira proposes, I think we need to consider more. I propose we write down what are the maps involved and their relations (when and why a block moves to and from these maps etc). was (Author: rangadi): I think this was the policy the case even in pre-0.17.0 NameNode i.e. Blocks were deleted only lazily from blocksMap. Whether HADOOP-4477 has always been there or made more probably by another policy I am not sure. bq. My proposal is to remove a replica from the blocks map when it is marked as "invalid" (i.e., when it is moved to the recentInvalidateSet) as a result of over-replication. Also when a block report comes in, and a new replica is found but it is marked as invalid, this new replica does not get added to the blocks map. This probably needs more details. We have so many maps : blocksMap, neededReplications, excessReplications etc. These are all supposed to be consistent in some way. What the consistency requirements are or how the requirements are enforced in not explicitly defined anywhere. I am afraid if we make one isolated change now, it is very hard say for sure that we are not introducing issues similar to HADOOP-4477. We could probably do something smaller to avoid HADOOP-4477. But to change a policy that been there since the beginning as this jira proposes, I think we need to consider more. I propose we write down what are the maps involved and their relations (when and why a block moves to and from these maps etc). > An invalidated block should be removed from the blockMap > -------------------------------------------------------- > > Key: HADOOP-4540 > URL: https://issues.apache.org/jira/browse/HADOOP-4540 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.0 > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Priority: Blocker > Fix For: 0.18.3 > > > Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: > 1. getBlockLocations may return locations that do not contain the block; > 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477. > 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.