hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1852) Umbrella task: Clean up HDFS unit tests for timing-sensitive conditions, improve both condition stimulators and condition detection loops
Date Thu, 21 Apr 2011 05:59:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022621#comment-13022621
] 

Matt Foley commented on HDFS-1852:
----------------------------------

Here are some proposed rules for such unit test elements:

* All "wait for condition" loops must have timeouts, which throw TimeoutException, and provide
a useful message regarding the current value of the condition variables being observed when
the timeout occurs.
* Timeout values may be fixed (as a static final), or parameterized (as an argument to the
method containing the wait loop), whichever is appropriate for the condition being waited
on.
* Whenever a condition is waited on, and then later asserted, the condition waited on must
be at least as stringent as the condition later asserted.  (Counter-example:  do not "wait
while (x < FOO)" and then assert (x == FOO).  x == FOO+1 would exit the wait loop but fail
the assert.)
* When waiting for a transient condition, such as detection of a corrupt replica which will
self-heal soon after being detected, either use a busy-wait or a very small sleep interval
in the wait loop, or allow for the possibility of "missing" the condition being waited for.

Commonly repeated methods that should be refactored to and imported from DFSTestUtils include:

* waitForReplication - wait for under- or over-replication to be normalized to the expected
replication factor
* waitForCorrupt - wait for a transient detection of a corrupted replica
* corruptReplica - corrupt a specified number of replicas out of a specified block

Please contribute other suggested rules and share-able methods, and open bugs against specific
unit test classes that need to be improved in this way.

> Umbrella task: Clean up HDFS unit tests for timing-sensitive conditions, improve both
condition stimulators and condition detection loops
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1852
>                 URL: https://issues.apache.org/jira/browse/HDFS-1852
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Matt Foley
>
> Both namenode and datanode have multiple background threads responsible for detecting
and responding to time-changing conditions, such as corrupt replicas, under- or over-replicated
blocks, etc.  The unit tests that attempt to exercise these threads often duplicate complex
combinations of actions, and use "wait for" or "wait until" loops to detect the results. 
The quality and robustness of these loops vary widely, and some problematic ones cause recurring
intermittent false positives in Hudson.  This is an umbrella task for a set of bugs to be
opened, to clean up these usages and move the common ones to DFSTestUtil.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message