hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped
Date Wed, 27 Jul 2016 19:59:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396249#comment-15396249
] 

Jason Lowe commented on YARN-5416:
----------------------------------

bq. I think we can close this as dup of that. What do you think?

I don't care much if we want to close this one for that one or vice-versa, just that we shouldn't
keep both open.  Since this is the one that has a patch, I'll go ahead and comment on the
patch here as Eric has also done.

bq. seems only necessary to wait before launch another AM immediately

I agree with Eric that it looks like another place was missed in the test.  IIUC we launch
AM1 then wait for it to enter the FAILED state then launch AM2.  This patch changes that to
do a more thorough wait before trying to launch AM2.  However later in the same test we wait
for the second AM to fail and launch a third attempt, which looks like the same case we're
trying to fix -- waiting for a previous AM to fully stop before immediately launching another
attempt:
{code}
    rm2.waitForState(am2.getApplicationAttemptId(), RMAppAttemptState.FAILED);
    launchAM(rmApp, rm2, nm1);
   Assert.assertEquals(3, rmApp.getAppAttempts().size());
 {code}

> TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not
wait SchedulerApplicationAttempt to be stopped
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5416
>                 URL: https://issues.apache.org/jira/browse/YARN-5416
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: test, yarn
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Minor
>         Attachments: YARN-5416.patch
>
>
> The test failure stack is:
> Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 385.338 sec <<<
FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> testRMRestartWaitForPreviousAMToFinish[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
 Time elapsed: 43.134 sec  <<< FAILURE!
> java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:<ALLOCATED>
but was:<SCHEDULED>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:86)
> 	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:594)
> 	at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:1008)
> 	at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:530)
> This is due to the same issue that partially fixed in YARN-4968



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message