hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "VinayaKumar B (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2770) Block reports may mark corrupt blocks pending deletion as non-corrupt
Date Sun, 01 Apr 2012 07:43:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243676#comment-13243676
] 

VinayaKumar B commented on HDFS-2770:
-------------------------------------

Hi Todd,

I think corrupt replicas are invalidated only if the Number of good replicas more than or
equal to replication. But you told it is invalidated immediately.
{code}// Add this replica to corruptReplicas Map
    corruptReplicas.addToCorruptReplicasMap(storedBlock, node, reason);
    if (countNodes(storedBlock).liveReplicas() >= inode.getReplication()) {
      // the block is over-replicated so invalidate the replicas immediately
      invalidateBlock(storedBlock, node);
    } else if (namesystem.isPopulatingReplQueues()) {
      // add the block to neededReplication
      updateNeededReplications(storedBlock, -1, 0);
    }{code}

If number of datanodes equal to replication, out of which one replica is marked corrupt, then
that replica will never be deleted and replication also wont happen.
Same Issue in HDFS-2932.
                
> Block reports may mark corrupt blocks pending deletion as non-corrupt
> ---------------------------------------------------------------------
>
>                 Key: HDFS-2770
>                 URL: https://issues.apache.org/jira/browse/HDFS-2770
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> It seems like HDFS-900 may have regressed in trunk since it was committed without a regression
test. In HDFS-2742 I saw the following sequence of events:
> - A block at replication 2 had one of its replicas marked as corrupt on the NN
> - NN scheduled deletion of that replica in {{invalidateWork}}, and removed it from the
block map
> - The DN hosting that block sent a block report, which caused the replica to get re-added
to the block map as if it were good
> - The deletion request was passed to the DN and it deleted the block
> - Now we're in a bad state, where the NN temporarily thinks that it has two good replicas,
but in fact one of them has been deleted. If we lower replication of this block at this time,
the one good remaining replica may be deleted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message