From yarn-issues-return-134512-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Wed Jan 10 04:04:14 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id E68FB180718 for ; Wed, 10 Jan 2018 04:04:14 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D66ED160C3F; Wed, 10 Jan 2018 03:04:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2674B160C17 for ; Wed, 10 Jan 2018 04:04:14 +0100 (CET) Received: (qmail 79365 invoked by uid 500); 10 Jan 2018 03:04:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 79354 invoked by uid 99); 10 Jan 2018 03:04:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jan 2018 03:04:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A2915C1D2D for ; Wed, 10 Jan 2018 03:04:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.911 X-Spam-Level: X-Spam-Status: No, score=-99.911 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id sVrUzCUBb0kT for ; Wed, 10 Jan 2018 03:04:11 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 531955F24E for ; Wed, 10 Jan 2018 03:04:11 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 55584E129D for ; Wed, 10 Jan 2018 03:04:08 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2C0DB240F2 for ; Wed, 10 Jan 2018 03:04:03 +0000 (UTC) Date: Wed, 10 Jan 2018 03:04:03 +0000 (UTC) From: "lujie (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310909#comment-16310909 ] lujie edited comment on YARN-7663 at 1/10/18 3:03 AM: ------------------------------------------------------ After reading Jason Lowe useful suggestion. I rewrite the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected method:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state ACCEPTED 2.testAppRunningFailed:,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] was (Author: xiaoheipangzi): After reading Jason Lowe useful suggestion. I rewrite the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected method:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] > RMAppImpl:Invalid event: START at KILLED > ---------------------------------------- > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: lujie > Assignee: lujie > Priority: Minor > Labels: patch > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch, YARN-7663_7.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: START at KILLED > at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org