hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently
Date Tue, 31 Mar 2015 10:39:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388352#comment-14388352
] 

zhihai xu commented on YARN-2666:
---------------------------------

Hi [~ywskycn], Could you assign this JIRA to me?
I think I know what cause this Intermittent failure.
The problem is because ContinuousSchedulingThread is calling continuousSchedulingAttempt periodically.

And continuousSchedulingAttempt doesn't hold the FairScheduler lock.
continuousSchedulingAttempt can run at any time,
{code}
    for (NodeId nodeId : nodeIdList) {
      FSSchedulerNode node = getFSSchedulerNode(nodeId);
      try {
        if (node != null && Resources.fitsIn(minimumAllocation,
            node.getAvailableResource())) {
          attemptScheduling(node);
        }
      } catch (Throwable ex) {
        LOG.error("Error while attempting scheduling for node " + node +
            ": " + ex.toString(), ex);
      }
    }
{code}
when the testContinuousScheduling run scheduler.allocate to make a container allocation request.
It is possible application.updateResourceRequests in scheduler.allocate is running right after
attemptScheduling first node and before attemptScheduling second node. then the second node
with less resource will allocate container for this allocation request.
Then the issue will happen: both containers are allocated on the same node.
The default ContinuousSchedulingSleepMs is 5ms which is very short, If we increase ContinuousSchedulingSleepMs,
the test failure will be much less. We can make the test deterministic by manually calling
continuousSchedulingAttempt after second allocation request and stopping the ContinuousSchedulingThread
before second allocation request.
I uploaded a patch which will stop ContinuousSchedulingThread before second allocation request
and manually call continuousSchedulingAttempt after second allocation request.

> TestFairScheduler.testContinuousScheduling fails Intermittently
> ---------------------------------------------------------------
>
>                 Key: YARN-2666
>                 URL: https://issues.apache.org/jira/browse/YARN-2666
>             Project: Hadoop YARN
>          Issue Type: Test
>          Components: scheduler
>            Reporter: Tsuyoshi Ozawa
>            Assignee: Wei Yan
>         Attachments: YARN-2666.000.patch
>
>
> The test fails on trunk.
> {code}
> Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec <<<
FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
 Time elapsed: 0.582 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.junit.Assert.assertEquals(Assert.java:542)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message