hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3982) report failed replications in DN heartbeat
Date Thu, 27 Sep 2012 17:42:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464910#comment-13464910
] 

Uma Maheswara Rao G commented on HDFS-3982:
-------------------------------------------

{quote}
since block is on pendingReplications, NN does not schedule another replication.
{quote}
After pending replication timed out , NN not scheduling for replication again? 
NN will add the blocks from pendingReplications to neededReplications if they timedout. On
successful replication pendingReplications anyway will be removed.

Currently if replication and cluster size is same, then It won't replicate it as there is
no new node to copy block as existing nodes already has blocks(corrupt or good). Until there
are enough number of good replicas, it won't invalidate any block. See some details in HDFS-3586.
I am not sure this issue is same or similar to it, please take a look once and confirm please.
                
> report failed replications in DN heartbeat
> ------------------------------------------
>
>                 Key: HDFS-3982
>                 URL: https://issues.apache.org/jira/browse/HDFS-3982
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 2.0.2-alpha
>            Reporter: Andy Isaacson
>            Assignee: Andy Isaacson
>            Priority: Minor
>
> From HDFS-3931:
> {quote}
> # The test corrupts 2/3 replicas.
> # client reports a bad block.
> # NN asks a DN to re-replicate, and randomly picks the other corrupt replica.
> # DN notices the incoming replica is corrupt and reports it as a bad block, but does
not inform the NN that re-replication failed.
> # NN keeps the block on pendingReplications.
> # BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports
both as duplicates, one from the client and one from the DN report above.
> since block is on pendingReplications, NN does not schedule another replication.
> Todd wrote:
> I can think of a few ways to fix this:
> ...
>  2) Add a field to the DN heartbeat which reports back a failed replication for a given
block. The NN would use this to decrement the pendingReplication count, which would cause
a new replication attempt to be made if it was still under-replicated.
> This jira tracks implementing the DN heartbeat replication failure report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message