hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4615) TestAbstractYarnScheduler#testResourceRequestRecoveryToTheRightAppAttempt fails occasionally
Date Wed, 27 Jan 2016 14:09:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119268#comment-15119268

Sunil G commented on YARN-4615:

      // AM crashes, and a new app-attempt gets created
      node.nodeHeartbeat(applicationAttemptOneID, 1, ContainerState.COMPLETE);
      rm.waitForState(node, am1ContainerID, RMContainerState.COMPLETED);
      RMAppAttempt rmAppAttempt2 = MockRM.waitForAttemptScheduled(rmApp, rm);

Above code snippet is from test case mentioned in JIRA title. And {{MockRM.waitForAttemptScheduled}}
has reported the wrong state pblm.

In above line {{rm.waitForState}}, AM container state is verified whether its COMPLETED. And
waitForAttemptScheduled tries to wait till next attempt is SCHEDULED. However this goes to
ALLOCATED (an extra node heartbeat might have reached and pushed the container to be allocated).

If we see {{rm.waitForState}}, it sends nodeHeartbeat if state is not correct (while waiting).
And this is not needed as we already send a heartbeat with container completed details. I
suspect that {{RMContainerState.COMPLETED}} was not reached for Am container when state was
verified in  {{rm.waitForState}}. And one extra heartbeat is sent from this method.

I will upload a patch with a new  {{rm.waitForState}} which doesnt send nodeHeartBeat, rather
it will only wait till timeout happens. [~rohithsharma] pls share your thoughts.

> TestAbstractYarnScheduler#testResourceRequestRecoveryToTheRightAppAttempt fails occasionally
> --------------------------------------------------------------------------------------------
>                 Key: YARN-4615
>                 URL: https://issues.apache.org/jira/browse/YARN-4615
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Jason Lowe
> Sometimes TestAbstractYarnScheduler#testResourceRequestRecoveryToTheRightAppAttempt will
fail like this:
> {noformat}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
> testResourceRequestRecoveryToTheRightAppAttempt[1](org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler)
 Time elapsed: 77.427 sec  <<< FAILURE!
> java.lang.AssertionError: Attempt state is not correct (timedout): expected: SCHEDULED
actual: ALLOCATED for the application attempt appattempt_1453254869107_0001_000002
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:197)
> 	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:172)
> 	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForAttemptScheduled(MockRM.java:831)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler.testResourceRequestRecoveryToTheRightAppAttempt(TestAbstractYarnScheduler.java:572)
> {noformat}

This message was sent by Atlassian JIRA

View raw message