hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load
Date Sat, 18 Feb 2012 09:48:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210888#comment-13210888
] 

Steve Loughran commented on HDFS-2966:
--------------------------------------

My planned solution to this is move from sleep-then-assert to sleep-poll-repeat for a longer
period of time. If the state is reached sooner, the test finishes earlier, but if the machine
is overloaded the test will stretch out. This may make it faster on some machines, as well
as less brittle on others.

                
> TestNameNodeMetrics tests can fail under load
> ---------------------------------------------
>
>                 Key: HDFS-2966
>                 URL: https://issues.apache.org/jira/browse/HDFS-2966
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.24.0
>         Environment: OS/X running intellij IDEA, firefox, winxp in a virtualbox.
>            Reporter: Steve Loughran
>            Priority: Minor
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of running the
HDFS tests on a desktop with out enough memory for all the programs trying to run. Things
got swapped out and the tests failed as the DN heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the delete operation
has completed, but all it does is sleep for the same number of seconds as there are datanodes.
This is too brittle -it may work on a lightly-loaded system, but not on a system under heavy
load where it is taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message