hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
Date Mon, 05 Jun 2017 21:06:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037595#comment-16037595

Hudson commented on HDFS-10816:

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11825 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11825/])
HDFS-10816. TestComputeInvalidateWork#testDatanodeReRegistration fails (kihwal: rev e4e203e0807fafc5dd765344d008e42bd51cc979)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestComputeInvalidateWork.java

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and
replication monitor
> -----------------------------------------------------------------------------------------------------------
>                 Key: HDFS-10816
>                 URL: https://issues.apache.org/jira/browse/HDFS-10816
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>             Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>         Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, HDFS-10816.002.patch,
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs expected:<3>
but was:<2>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the replication monitor.
The default replication monitor interval is 3 seconds, which is just about how long the test
normally takes to run. The test deletes a file and then subsequently gets the namesystem writelock.
However, if the replication monitor fires in between those two instructions, the test will
fail as it will itself invalidate one of the blocks. This can be easily reproduced by removing
the sleep() in the ReplicationMonitor's run() method in BlockManager.java, so that the replication
monitor executes as quickly as possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication monitor. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message