Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4D8F1200B58 for ; Wed, 27 Jul 2016 21:59:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4C6D3160AA9; Wed, 27 Jul 2016 19:59:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9BE7E160A6F for ; Wed, 27 Jul 2016 21:59:21 +0200 (CEST) Received: (qmail 72666 invoked by uid 500); 27 Jul 2016 19:59:20 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 72623 invoked by uid 99); 27 Jul 2016 19:59:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2016 19:59:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 868472C0D63 for ; Wed, 27 Jul 2016 19:59:20 +0000 (UTC) Date: Wed, 27 Jul 2016 19:59:20 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 27 Jul 2016 19:59:22 -0000 [ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396249#comment-15396249 ] Jason Lowe commented on YARN-5416: ---------------------------------- bq. I think we can close this as dup of that. What do you think? I don't care much if we want to close this one for that one or vice-versa, just that we shouldn't keep both open. Since this is the one that has a patch, I'll go ahead and comment on the patch here as Eric has also done. bq. seems only necessary to wait before launch another AM immediately I agree with Eric that it looks like another place was missed in the test. IIUC we launch AM1 then wait for it to enter the FAILED state then launch AM2. This patch changes that to do a more thorough wait before trying to launch AM2. However later in the same test we wait for the second AM to fail and launch a third attempt, which looks like the same case we're trying to fix -- waiting for a previous AM to fully stop before immediately launching another attempt: {code} rm2.waitForState(am2.getApplicationAttemptId(), RMAppAttemptState.FAILED); launchAM(rmApp, rm2, nm1); Assert.assertEquals(3, rmApp.getAppAttempts().size()); {code} > TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped > ------------------------------------------------------------------------------------------------------------------------------------ > > Key: YARN-5416 > URL: https://issues.apache.org/jira/browse/YARN-5416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test, yarn > Reporter: Junping Du > Assignee: Junping Du > Priority: Minor > Attachments: YARN-5416.patch > > > The test failure stack is: > Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 385.338 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartWaitForPreviousAMToFinish[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) Time elapsed: 43.134 sec <<< FAILURE! > java.lang.AssertionError: AppAttempt state is not correct (timedout) expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:86) > at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:594) > at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:1008) > at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:530) > This is due to the same issue that partially fixed in YARN-4968 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org