hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA
Date Thu, 21 Jul 2016 11:19:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387533#comment-15387533

Jun Gong commented on YARN-5333:

{quote}Could you also please confirm that whether you have added new queue manually in capacity-scheduler.xml
of Standby node, and test the same scenario.
I copy the capacity-scheduler.xml from active RM to standby RM, then they are same on both
RMs. Yes, I tested the same scenario.

Because the current approach in your patch will induce a new problem. Suppose if capacity-scheduler.xml
is corrupted, then we will say a case where bth RMs will toggle to become active. We had discussed
this solutions in another HA ticket and has thought about not trying to do any refresh until
active services are started.
If if capacity-scheduler.xml was corrupted, I saw RM crashed when RM HA because it failed
to validateConf({{CapacityScheduler.validateConf}})(Note: when capacity-scheduler.xml is corrupted,
running {{refreshQueues }} will just fail and not cause RM to crash). Is there something I

> Some recovered apps are put into default queue when RM HA
> ---------------------------------------------------------
>                 Key: YARN-5333
>                 URL: https://issues.apache.org/jira/browse/YARN-5333
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-5333.01.patch, YARN-5333.02.patch, YARN-5333.03.patch
> Enable RM HA and use FairScheduler, {{yarn.scheduler.fair.allow-undeclared-pools}} is
set to false, {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file {{etc/hadoop/fair-scheduler.xml}}, then
add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover apps.
> However the new active RM will put recovered apps into default queue because it might
have not loaded the new {{fair-scheduler.xml}}. We need call {{initScheduler}} before start
active services or bring {{refreshAll()}} in front of {{rm.transitionToActive()}}. *It seems
it is also important for other scheduler*.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message