hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3586) Blocks are not getting replicate even DN's are availble.
Date Mon, 02 Jul 2012 18:05:22 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405178#comment-13405178

Uma Maheswara Rao G commented on HDFS-3586:

Thanks Brahma for filing the JIRA.

This is similar to HDFS-3493. But here the change is, DNs are available more than replication.
So, ideally block should get replicated.
The problem here is, you have 2 live replicas and in remaining 2 DNs you have partial block
present in RBW. So, when NN tries to replicate, DN will reject them saying block already exist
in RBW. So, your replication may not happen even though you have more nodes.

Here I think the possible fix could be that, we should change the below condition 
*if (countNodes(b.stored).liveReplicas() >= bc.getReplication()) {*

to something like *if ((countNodes(b.stored).liveReplicas() + countNodes(b.stored).corruptReplicas())
 > bc.getReplication()) {*

So, the extra corrupted blocks(more than replication) will get invalidated and later replication
can work normally.

> Blocks are not getting replicate even DN's are availble.
> --------------------------------------------------------
>                 Key: HDFS-3586
>                 URL: https://issues.apache.org/jira/browse/HDFS-3586
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, name-node
>    Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 3.0.0
>            Reporter: Brahma Reddy Battula
>         Attachments: HDFS-3586-analysis.txt
> Scenario:
> =========
> Started four DN's(Say DN1,DN2,DN3 and DN4)
> writing files with RF=3..
> formed pipeline with DN1->DN2->DN3.
> Since DN3 network is very slow.it's not able to send acks.
> Again pipeline is fromed with DN1->DN2->DN4.
> Here DN4 network is also slow.
> So finally commitblocksync happend tp DN1 and DN2 successfully.
> block present in all the four DN's(finalized state in two DN's and rbw state in another
> Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's are already
present in RBW dir.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message