hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lohit vijayarenu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2065) Replication policy for corrupted block
Date Mon, 21 Apr 2008 21:57:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591080#action_12591080
] 

lohit vijayarenu commented on HADOOP-2065:
------------------------------------------

When a datanode reports block as corrupt, instead of deleting we mark the (datanode-block)
to be corrupt and request replication of this block

- Add a new synchronized method something like FSNameSystem.markAsCorrupt(Block, DatanodeDescriptor)
which could mark replica of the this particular Datanode to be corrupt. It would also add
this block to neededReplication queue.
- Each DatanodeDescriptor hold a set of corrupt blocks and provide methods to lookup given
a block.
- Modify NumReplicas class to filter out nodes with such replica copies and report them via
corruptReplicas() similar to decommissionedReplicas()
- While choosing the src node for replication in chooseSourceDatanode() we use the copies
which are not yet marked as corrupt
- Inside addStoredBlock() whenever we add a new node(replica) we also check, if we already
have a corrupt copy. If we have reached the desired replication for this block and the corrupt
block is in excess, we invalidate it here.

I think this would take care of retaining all corrupt copies, but one case when I see a problem
is pendingReplication thread which would keep on looping to replicate corrupt blocks. We could
have a check here to see if number of pending replicas for block is equal to the number of
corrupt copies and remove from pendingReplication thread. 

> Replication policy for corrupted block 
> ---------------------------------------
>
>                 Key: HADOOP-2065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2065
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Koji Noguchi
>            Assignee: lohit vijayarenu
>             Fix For: 0.18.0
>
>
> Thanks to HADOOP-1955, even if one of the replica is corrupted, the block should get
replicated from a good replica relatively fast.
> Created this ticket to continue the discussion from http://issues.apache.org/jira/browse/HADOOP-1955#action_12531162.
> bq. 2. Delete corrupted source replica
> bq. 3. If all replicas are corrupt, stop replication.
> For (2), it'll be nice if the namenode can delete the corrupted block if there's a good
replica on other nodes.
> For (3), I prefer if the namenode can still replicate the block.
> Before 0.14, if the file was corrupted, users were still able to pull the data and decide
if they want to delete those files. (HADOOP-2063)
> In 0.14 and later, we cannot/don't replicate these blocks so they eventually get lost.
> To make the matters worse, if the corrupted file is accessed, all the corrupted replicas
would be deleted except for one and stay as replication factor of 1 forever.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message