hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3162) BlockMap's corruptNodes count and CorruptReplicas map count is not matching.
Date Sat, 31 Mar 2012 20:38:24 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243281#comment-13243281
] 

Uma Maheswara Rao G commented on HDFS-3162:
-------------------------------------------

I don't this problem because of append usage.

Looks like this a race between markBlockAsCorrupt and processOverReplicatedBlocks.

1) NN detects over replicated block and added to invalidates list for DNn.
2) Before processing invalidates list, BlockScanner found that block corrupted in DNn and
reported to NN. 
3) Before acquiring lock, Invalidates list got processed and removed the block from blocksMap
from DNn.
4) Now markBlockAsCorrupt started processing. 
   {code}
 // Add this replica to corruptReplicas Map 
      corruptReplicas.addToCorruptReplicasMap(storedBlockInfo, node);
      if (countNodes(storedBlockInfo).liveReplicas()>inode.getReplication()) {
        // the block is over-replicated so invalidate the replicas immediately
        invalidateBlock(storedBlockInfo, node);
      } else {
        // add the block to neededReplication 
        updateNeededReplications(storedBlockInfo, -1, 0);
      }
{code}
 since it found the enough replication and invalidateBlock. It will try to remove the storedBlock
if line Replicas are more than one.
 This call will just return, because it was already removed blocksMap.

But it was already added to corruptReplicas Map(shown in the above peice of code).

So, now the counts of corruptReplicas map and blockMap are different about corrupt replicas.

Mostly this exists only on branch-1. 

I think this problem already addressed in Trunk.

code from trunk.
{code}
// Add replica to the data-node if it is not already there
    node.addBlock(storedBlock);

    // Add this replica to corruptReplicas Map
    corruptReplicas.addToCorruptReplicasMap(storedBlock, node, reason);
    if (countNodes(storedBlock).liveReplicas() >= inode.getReplication()) {
      // the block is over-replicated so invalidate the replicas immediately
      invalidateBlock(storedBlock, node);
    } 
{code}

see the first line above. If the block is not already there it is adding to it. I think this
should have solve the problem in trunk.
                
> BlockMap's corruptNodes count and CorruptReplicas map count is not matching.
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-3162
>                 URL: https://issues.apache.org/jira/browse/HDFS-3162
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.0.0
>            Reporter: suja s
>            Assignee: Uma Maheswara Rao G
>            Priority: Minor
>             Fix For: 1.0.3
>
>
> Even after invalidating the block, continuosly below log is coming
>  
> Inconsistent number of corrupt replicas for blk_1332906029734_1719blockMap has 0 but
corrupt replicas map has 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message