hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-12279) TestPipelinesFailover#testPipelineRecoveryStress fails due to race condition
Date Tue, 08 Aug 2017 21:22:02 GMT
Wei-Chiu Chuang created HDFS-12279:
--------------------------------------

             Summary: TestPipelinesFailover#testPipelineRecoveryStress fails due to race condition
                 Key: HDFS-12279
                 URL: https://issues.apache.org/jira/browse/HDFS-12279
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode, test
            Reporter: Wei-Chiu Chuang


Saw a test failure in a precommit test
https://builds.apache.org/job/PreCommit-HDFS-Build/20600/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestPipelinesFailover/testPipelineRecoveryStress/

{noformat}
Error Message

Deferred
Stacktrace

java.lang.RuntimeException: Deferred
	at org.apache.hadoop.test.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:130)
	at org.apache.hadoop.test.MultithreadedTestUtil$TestContext.stop(MultithreadedTestUtil.java:166)
	at org.apache.hadoop.hdfs.server.namenode.ha.HAStressTestHarness.shutdown(HAStressTestHarness.java:154)
	at org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testPipelineRecoveryStress(TestPipelinesFailover.java:493)
Caused by: java.lang.AssertionError: null
	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:641)
	at org.apache.hadoop.hdfs.server.blockmanagement.InvalidateBlocks.invalidateWork(InvalidateBlocks.java:299)
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateWorkForOneNode(BlockManager.java:4236)
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeInvalidateWork(BlockManager.java:1736)
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerTestUtil.computeInvalidationWork(BlockManagerTestUtil.java:169)
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerTestUtil.computeAllPendingWork(BlockManagerTestUtil.java:185)
	at org.apache.hadoop.hdfs.server.namenode.ha.HAStressTestHarness$1.doAnAction(HAStressTestHarness.java:102)
	at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
	at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
{noformat}


Studying the code, the assert can only fail due to a race condition that only happens in the
test.

Specifically, the test uses BlockManagerTestUtil to call {{BlockManager#computeInvalidateWork}},
which gets {{invalidateBlocks.getDatanodes()}}. Afterwards, use the list to perform block
invalidation via {{InvalidateBlocks#invalidateWork}}, which calls {{DatanodeDesriptor#addBlocksToBeInvalidated}}
and there is an assertion to ensure the invalidation list is not empty. However, if the BlockManager
performs block invalidation before {{DatanodeDesriptor#addBlocksToBeInvalidated}}, the invalidation
list can be empty, because there's no proper lock to ensure atomicity.

This is not a problem for real cluster, because there is only one BlockManager per NameNode
process, so the potential race condition is not exposed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message