Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <1834770839.1225478506480.JavaMail.jira@brutus>
Date: Fri, 31 Oct 2008 11:41:46 -0700 (PDT)
From: "Raghu Angadi (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Issue Comment Edited: (HADOOP-4540) An invalidated block
 should be removed from the blockMap
In-Reply-To: <2009888801.1225317764190.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644391#action_12644391 ] 

rangadi edited comment on HADOOP-4540 at 10/31/08 11:40 AM:
-----------------------------------------------------------------

(Edit : corrected the jira number refererred.)

I think this was the policy  the case even in pre-0.17.0 NameNode i.e. Blocks were deleted only lazily from blocksMap. Whether HADOOP-4556 has always been there or made more probably by another policy I am not sure.

bq. My proposal is to remove a replica from the blocks map when it is marked as "invalid" (i.e., when it is moved to the recentInvalidateSet) as a result of over-replication. Also when a block report comes in, and a new replica is found but it is marked as invalid, this new replica does not get added to the blocks map.

This probably needs more details.

We have so many maps : blocksMap, neededReplications, excessReplications etc. These are all supposed to be consistent in some way. What the consistency requirements are or how the requirements are enforced in not explicitly defined anywhere. I am afraid if we make one isolated change now, it is very hard say for sure that we are not introducing issues similar to HADOOP-4556. 

We could probably do something smaller to avoid HADOOP-4556. But to change a policy that been there since the beginning as this jira proposes, I think we need to consider more. I propose we write down what are the maps involved and their relations (when and why a block moves to and from these maps etc).


      was (Author: rangadi):
    I think this was the policy  the case even in pre-0.17.0 NameNode i.e. Blocks were deleted only lazily from blocksMap. Whether HADOOP-4477 has always been there or made more probably by another policy I am not sure.

bq. My proposal is to remove a replica from the blocks map when it is marked as "invalid" (i.e., when it is moved to the recentInvalidateSet) as a result of over-replication. Also when a block report comes in, and a new replica is found but it is marked as invalid, this new replica does not get added to the blocks map.

This probably needs more details.

We have so many maps : blocksMap, neededReplications, excessReplications etc. These are all supposed to be consistent in some way. What the consistency requirements are or how the requirements are enforced in not explicitly defined anywhere. I am afraid if we make one isolated change now, it is very hard say for sure that we are not introducing issues similar to HADOOP-4477. 

We could probably do something smaller to avoid HADOOP-4477. But to change a policy that been there since the beginning as this jira proposes, I think we need to consider more. I propose we write down what are the maps involved and their relations (when and why a block moves to and from these maps etc).


> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.