hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA
Date Thu, 21 Jul 2016 15:39:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387915#comment-15387915

Sunil G commented on YARN-5333:

[~hex108], thanks for the clarification. With  YARN-3893, we were trying to fail-fast RM if
wrong capacity-scheduler is present. With the current patch, 
         try {
+          reinitializeActiveServices();
           return null;
         } catch (Exception e) {
any exception during queue reinitialize will not make RM fail-fast. So I think you can have
{{reinitializeActiveServices}} in another try block and invoke RM fail-fast with its exception
handling block. 
However one more thing worries me. with this patch, reinitialize queue is done before starting
the active services. So many service like nodelabel manager etc are not started (or dispatcher
threads are not started). So if  {{reinitialize}} has some event call flow, then such case
may be a pblm. But as far as I checked, no such event handling is present in {{reinitialize}}
call flow. Still I suggest to confirm once, I will also verify and will update if I find some

> Some recovered apps are put into default queue when RM HA
> ---------------------------------------------------------
>                 Key: YARN-5333
>                 URL: https://issues.apache.org/jira/browse/YARN-5333
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-5333.01.patch, YARN-5333.02.patch, YARN-5333.03.patch
> Enable RM HA and use FairScheduler, {{yarn.scheduler.fair.allow-undeclared-pools}} is
set to false, {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file {{etc/hadoop/fair-scheduler.xml}}, then
add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover apps.
> However the new active RM will put recovered apps into default queue because it might
have not loaded the new {{fair-scheduler.xml}}. We need call {{initScheduler}} before start
active services or bring {{refreshAll()}} in front of {{rm.transitionToActive()}}. *It seems
it is also important for other scheduler*.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message