Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1EBF9200B53 for ; Tue, 12 Jul 2016 18:02:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1D568160A75; Tue, 12 Jul 2016 16:02:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 67AB9160A56 for ; Tue, 12 Jul 2016 18:02:22 +0200 (CEST) Received: (qmail 15159 invoked by uid 500); 12 Jul 2016 16:02:20 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 14827 invoked by uid 99); 12 Jul 2016 16:02:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jul 2016 16:02:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 922A12C02AA for ; Tue, 12 Jul 2016 16:02:20 +0000 (UTC) Date: Tue, 12 Jul 2016 16:02:20 +0000 (UTC) From: "Jun Gong (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-5333) Some recovered apps are put into default queue when RM HA MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 12 Jul 2016 16:02:23 -0000 [ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-5333: --------------------------- Description: Enable RM HA and use FairScheduler, {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, {{yarn.scheduler.fair.user-as-default-queue}} is set to false. Reproduce steps: 1. Start two RMs. 2. After RMs are running, change both RM's file {{etc/hadoop/fair-scheduler.xml}}, then add some queues. 3. Submit some apps to the new added queues. 4. Stop the active RM, then the standby RM will transit to active and recover apps. However the new active RM will put recovered apps into default queue because it might have not loaded the new {{fair-scheduler.xml}}. We need call {{initScheduler}} before start active services or bring {{refreshAll()}} in front of {{rm.transitionToActive()}}. *It seems it is also important for other scheduler*. was: Enable RM HA and use FairScheduler, {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, {{yarn.scheduler.fair.user-as-default-queue}} is set to false. Reproduce steps: 1. Start two RMs. 2. After RMs are running, change both RM's file {{etc/hadoop/fair-scheduler.xml}}, then add some queues. 3. Submit some apps to the new added queues. 4. Stop the active RM, then the standby RM will transit to active and recover apps. However the new active RM will reject recovered apps because it might have not loaded the new {{fair-scheduler.xml}}. We need call {{initScheduler}} before start active services or bring {{refreshAll()}} in front of {{rm.transitionToActive()}}. *It seems it is aslo important for other scheduler*. Related logs are as following: {quote} 2016-07-07 16:55:34,756 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recover ended ... 2016-07-07 16:55:34,824 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Loading allocation file /gaia/hadoop/etc/hadoop/fair-scheduler.xml 2016-07-07 16:55:34,826 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application rejected by queue placement policy 2016-07-07 16:55:34,828 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application appattempt_1467803586002_0006_000001 is done. finalState=FAILED 2016-07-07 16:55:34,828 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Unknown application appattempt_1467803586002_0006_000001 has completed! 2016-07-07 16:55:34,828 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application rejected by queue placement policy 2016-07-07 16:55:34,828 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application appattempt_1467803586002_0004_000001 is done. finalState=FAILED 2016-07-07 16:55:34,828 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Unknown application appattempt_1467803586002_0004_000001 has completed! 2016-07-07 16:55:34,828 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APP_REJECTED at ACCEPTED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:88) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:718) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:702) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124) at java.lang.Thread.run(Thread.java:745) {quote} > Some recovered apps are put into default queue when RM HA > --------------------------------------------------------- > > Key: YARN-5333 > URL: https://issues.apache.org/jira/browse/YARN-5333 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Jun Gong > Assignee: Jun Gong > Attachments: YARN-5333.01.patch > > > Enable RM HA and use FairScheduler, {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, {{yarn.scheduler.fair.user-as-default-queue}} is set to false. > Reproduce steps: > 1. Start two RMs. > 2. After RMs are running, change both RM's file {{etc/hadoop/fair-scheduler.xml}}, then add some queues. > 3. Submit some apps to the new added queues. > 4. Stop the active RM, then the standby RM will transit to active and recover apps. > However the new active RM will put recovered apps into default queue because it might have not loaded the new {{fair-scheduler.xml}}. We need call {{initScheduler}} before start active services or bring {{refreshAll()}} in front of {{rm.transitionToActive()}}. *It seems it is also important for other scheduler*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org