Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F7D417A81 for ; Mon, 6 Oct 2014 23:32:34 +0000 (UTC) Received: (qmail 44827 invoked by uid 500); 6 Oct 2014 23:32:34 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 44771 invoked by uid 500); 6 Oct 2014 23:32:34 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 44757 invoked by uid 99); 6 Oct 2014 23:32:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2014 23:32:34 +0000 Date: Mon, 6 Oct 2014 23:32:34 +0000 (UTC) From: "Jian He (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161210#comment-14161210 ] Jian He commented on YARN-2649: ------------------------------- [~mingma], thanks for working on this ! bq. Another way to fix it is to change MockRM.submitApp to waitForState on RMAppAttempt. That might address other test cases that use MockRM.submitApp. I recently saw some other similar test failure e.g. YARN-2483. maybe this is what we should do. could you also run all tests locally, in case we don't introduce regression failure? thx > Flaky test TestAMRMRPCNodeUpdates > --------------------------------- > > Key: YARN-2649 > URL: https://issues.apache.org/jira/browse/YARN-2649 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Ming Ma > Attachments: YARN-2649.patch > > > Sometimes the test fails with the following error: > testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates) Time elapsed: 41.73 sec <<< FAILURE! > junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected: but was: > at junit.framework.Assert.fail(Assert.java:50) > at junit.framework.Assert.failNotEquals(Assert.java:287) > at junit.framework.Assert.assertEquals(Assert.java:67) > at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) > at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382) > at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125) > When this happens, SchedulerEventType.NODE_UPDATE was processed before RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here is the log when this happens. > {noformat} > App State is : ACCEPTED > 2014-10-05 21:25:07,305 INFO [AsyncDispatcher event handler] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - appattempt_1412569506932_0001_000001 State change from NEW to SUBMITTED > 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType: STATUS_UPDATE > 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 of type STATUS_UPDATE > AppAttempt : appattempt_1412569506932_0001_000001 State is : SUBMITTED Waiting for state : ALLOCATED > 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType: APP_ATTEMPT_ADDED > 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE > 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType: ATTEMPT_ADDED > 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing event for appattempt_1412569506932_0001_000 > 001 of type ATTEMPT_ADDED > 2014-10-05 21:25:07,333 INFO [AsyncDispatcher event handler] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - appattempt_1412569506932_0001_000001 State change from SUBMITTED to SCHEDULED > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)