hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6272) TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
Date Wed, 19 Apr 2017 15:06:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974876#comment-15974876
] 

Jason Lowe commented on YARN-6272:
----------------------------------

I've also seen this stacktrace on 2.8:
{noformat}
java.lang.AssertionError: expected:<1> but was:<2>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:555)
	at org.junit.Assert.assertEquals(Assert.java:542)
	at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:920)
	at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:813)
{noformat}

In the above case, it looks like the nodemanager happened to be heartbeating just as the app
made the allocate call that asked for the increase request.  In that case it was able to process
both the increase and the decrease in the same heartbeat which the test explicitly does not
expect.

The test itself is very fragile.  It's launching a full minicluster and uses hardcoded sleeps
sprinkled in various places hoping asynchronous events have processed in the interim.  That
not only directly leads to flaky tests but slows down the unit test unnecessarily.  Either
the test needs to be made more tolerant of all the asynchronous stuff going on or ditch the
minicluster and explicitly manage the cluster heartbeating.  The former can be done by having
the test poll via app alloc heartbeats until it gets all the responses it needs rather than
assume which heartbeats will get which responses.  The latter can be done by using MockRM,
MockNM, and drain dispatchers so the test knows exactly which heartbeats have been completely
processed and thus know which app alloc calls will get the appropriate responses.  This latter
approach would also eliminate the need for any arbitrary polling/sleeping intervals and speed
up the test significantly.


> TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
> -----------------------------------------------------------------------------
>
>                 Key: YARN-6272
>                 URL: https://issues.apache.org/jira/browse/YARN-6272
>             Project: Hadoop YARN
>          Issue Type: Test
>          Components: yarn
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Ray Chiang
>
> I'm seeing this unit test fail fairly often in trunk:
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
 Time elapsed: 5.113 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.failNotEquals(Assert.java:743)
>         at org.junit.Assert.assertEquals(Assert.java:118)
>         at org.junit.Assert.assertEquals(Assert.java:555)
>         at org.junit.Assert.assertEquals(Assert.java:542)
>         at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087)
>         at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message