hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4338) TestNameNodeMetrics#testCorruptBlock is flaky
Date Thu, 27 Dec 2012 20:20:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Wang updated HDFS-4338:
------------------------------

    Attachment: hdfs-4338.patch

Turns out the race is between the call to {{BlockManagerTestUtil#getComputedDatanodeWork()}}
in the test and the {{BlockManager#ReplicationMonitor}} (which also calls {{#getComputedDatanodeWork()}}.
The {{ScheduledReplicationBlocks}} metric reports the number of blocks scheduled for replication
the last time {{BlockManager#getComputedDatanodeWork()}} was called. If the {{ReplicationMonitor}}
runs after the call to {{BlockManagerTestUtil#getComputedDatanodeWork}}, {{ScheduledReplicationBlocks}}
is correctly reported as 0, since the corrupted block was scheduled for replication last time.

The fix is simply to remove this assert. I also removed an unnecessary call to {{#updateState()}}
(which is called in {{#getComputedDatanodeWork()}}, and fixed a typo in a nearby comment.
                
> TestNameNodeMetrics#testCorruptBlock is flaky
> ---------------------------------------------
>
>                 Key: HDFS-4338
>                 URL: https://issues.apache.org/jira/browse/HDFS-4338
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: corruptblock, corruptblock.out, hdfs-4338.patch
>
>
> Ran some background cpuburn threads, got this stack trace:
> {noformat}
> Running org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.287 sec <<<
FAILURE!
> testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics)
 Time elapsed: 14922 sec  <<< FAILURE!
> java.lang.AssertionError: Bad value for metric ScheduledReplicationBlocks expected:<1>
but was:<0>
> 	at org.junit.Assert.fail(Assert.java:91)
> 	at org.junit.Assert.failNotEquals(Assert.java:645)
> 	at org.junit.Assert.assertEquals(Assert.java:126)
> 	at org.junit.Assert.assertEquals(Assert.java:470)
> 	at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:190)
> 	at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:229)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:242)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:137)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> 	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> 	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
> Results :
> Failed tests:   testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics):
Bad value for metric ScheduledReplicationBlocks expected:<1> but was:<0>
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message