hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1828) TestBlocksWithNotEnoughRacks intermittently fails assert
Date Thu, 14 Apr 2011 00:50:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019643#comment-13019643
] 

Eli Collins commented on HDFS-1828:
-----------------------------------

+1

I'll update the patch on HDFS-1562 to cover this case as well.

bq. I believe that under the circumstances of the test, curReplicas will in fact be REPLICATION_FACTOR
+ 1, transiently. 

I suspect this is because, post HDFS-15, the block remains in pending replications even though
there are sufficient total # replicas, so as soon as the new datanodes come up a new replica
is scheduled and an existing one is considered excess and is scheduled for deletion (it's
considered excess because the replication factor has not yet been increased). Then the replication
factor is increased causing 2 new replicas to be scheduled. If these new replicas complete
before the excess replica is deleted then we've got REPLICATION_FACTOR + 1.

> TestBlocksWithNotEnoughRacks intermittently fails assert
> --------------------------------------------------------
>
>                 Key: HDFS-1828
>                 URL: https://issues.apache.org/jira/browse/HDFS-1828
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>             Fix For: 0.23.0
>
>         Attachments: TestBlocksWithNotEnoughRacks.java.patch, TestBlocksWithNotEnoughRacks_v2.patch
>
>
> In server.namenode.TestBlocksWithNotEnoughRacks.testSufficientlyReplicatedBlocksWithNotEnoughRacks

> assert fails at curReplicas == REPLICATION_FACTOR, but it seems that it should go higher
initially, and if the test doesn't wait for it to go back down, it will fail false positive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message