hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-3982) report failed replications in DN heartbeat
Date Thu, 27 Sep 2012 02:25:07 GMT
Andy Isaacson created HDFS-3982:
-----------------------------------

             Summary: report failed replications in DN heartbeat
                 Key: HDFS-3982
                 URL: https://issues.apache.org/jira/browse/HDFS-3982
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node
    Affects Versions: 2.0.2-alpha
            Reporter: Andy Isaacson
            Assignee: Andy Isaacson
            Priority: Minor


>From HDFS-3931:
{quote}
# The test corrupts 2/3 replicas.
# client reports a bad block.
# NN asks a DN to re-replicate, and randomly picks the other corrupt replica.
# DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform
the NN that re-replication failed.
# NN keeps the block on pendingReplications.
# BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports
both as duplicates, one from the client and one from the DN report above.
since block is on pendingReplications, NN does not schedule another replication.

Todd wrote:
I can think of a few ways to fix this:
...
 2) Add a field to the DN heartbeat which reports back a failed replication for a given block.
The NN would use this to decrement the pendingReplication count, which would cause a new replication
attempt to be made if it was still under-replicated.

This jira tracks implementing the DN heartbeat replication failure report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message