Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id ACB53200BD4 for ; Fri, 2 Dec 2016 01:40:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id AB595160B0B; Fri, 2 Dec 2016 00:40:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F3C8C160B10 for ; Fri, 2 Dec 2016 01:39:59 +0100 (CET) Received: (qmail 8712 invoked by uid 500); 2 Dec 2016 00:39:58 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 8433 invoked by uid 99); 2 Dec 2016 00:39:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2016 00:39:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A1BBD2C2A6D for ; Fri, 2 Dec 2016 00:39:58 +0000 (UTC) Date: Fri, 2 Dec 2016 00:39:58 +0000 (UTC) From: "Haibo Chen (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (MAPREDUCE-6815) Fix flaky TestKill.testKillTask() MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 02 Dec 2016 00:40:00 -0000 [ https://issues.apache.org/jira/browse/MAPREDUCE-6815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713547#comment-15713547 ] Haibo Chen edited comment on MAPREDUCE-6815 at 12/2/16 12:39 AM: ----------------------------------------------------------------- bq. 2016-11-23 10:08:07,725 ERROR [AsyncDispatcher event handler] impl.JobImpl (JobImpl.java:handle(1004)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: JOB_TASK_COMPLETED at SETUP at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1002) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:140) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1465) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1461) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:187) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) In TestKill.testKillTask(), we do app.waitForState(job, JobState.RUNNING) to wait for the job to be running. But when the job is in running state externally, it can be in either SetUp or Running internally. If the Job is still in SetUp state, sending task_kill event will eventually cause the job to fail as show above. We could wait on the internal state instead of external state. was (Author: haibochen): bq. 2016-11-23 10:08:07,725 ERROR [AsyncDispatcher event handler] impl.JobImpl (JobImpl.java:handle(1004)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: JOB_TASK_COMPLETED at SETUP at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1002) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:140) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1465) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1461) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:187) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) In TestKill.testKillTask(), we do app.waitForState(job, JobState.RUNNING) to wait for the job to be running. But when the job is in running state externally, it can be in either SetUp or Running internally. If the Job is still in SetUp state, sending task_kill event will eventually cause the job to fail as show above > Fix flaky TestKill.testKillTask() > --------------------------------- > > Key: MAPREDUCE-6815 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6815 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 3.0.0-alpha1 > Reporter: Haibo Chen > Assignee: Haibo Chen > > Error Message > Job state is not correct (timedout) expected: but was: > Stacktrace > java.lang.AssertionError: Job state is not correct (timedout) expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.apache.hadoop.mapreduce.v2.app.MRApp.waitForState(MRApp.java:416) > at org.apache.hadoop.mapreduce.v2.app.TestKill.testKillTask(TestKill.java:124) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org