hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
Date Thu, 14 Jan 2010 00:42:54 GMT
Corrupt replicas are not tracked correctly through block report from DN

                 Key: HDFS-900
                 URL: https://issues.apache.org/jira/browse/HDFS-900
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 0.22.0
            Reporter: Todd Lipcon
            Priority: Critical
         Attachments: log-commented, to-reproduce.patch

This one is tough to describe, but essentially the following order of events is seen to occur:

# A client marks one replica of a block to be corrupt by telling the NN about it
# Replication is then scheduled to make a new replica of this node
# The replication completes, such that there are now 3 good replicas and 1 corrupt replica
# The DN holding the corrupt replica sends a block report. Rather than telling this DN to
delete the node, the NN instead marks this as a new *good* replica of the block, and schedules
deletion on one of the good replicas.

I don't know if this is a dataloss bug in the case of 1 corrupt replica with dfs.replication=2,
but it seems feasible. I will attach a debug log with some commentary marked by '============>',
plus a unit test patch which I can get to reproduce this behavior reliably. (it's not a proper
unit test, just some edits to an existing one to show it)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message