hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-7643) Handle recovery of applications on auto-created leaf queues
Date Wed, 13 Dec 2017 06:48:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288758#comment-16288758
] 

Sunil G edited comment on YARN-7643 at 12/13/17 6:47 AM:
---------------------------------------------------------

Thanks [~suma.shivaprasad]. Some comments here.
1 here
{code}
  void replaceQueueFromPlacementContext(
      ApplicationPlacementContext placementContext,
      ApplicationSubmissionContext context) {
    // Set it to ApplicationSubmissionContext
    //apply queue mapping only to new application submissions
    if (placementContext != null && !StringUtils.equalsIgnoreCase(
        context.getQueue(), placementContext.getQueue())) {
      LOG.info("Placed application=" + context.getApplicationId() +
          " to queue=" + placementContext.getQueue() + ", original queue="
          + context
          .getQueue());
      context.setQueue(placementContext.getQueue());
    }
  }
{code}
Queue after placement is already updated in submission context during application submission.
So while recovery, we already have the mapped queue name. Hence {{UserGroupMappingPlacementRule.getPlacementForApp}}
will have correct mapped queue name, but still we redo same action. Ideally the current issue
has happened because below event has to be fired from RMAppImpl to Scheduler and *placementContext*
will be null in current case of recovery (this might break for normal user-mapping also?).
{code}
      app.scheduler.handle(
          new AppAddedSchedulerEvent(app.user, app.submissionContext, true,
              app.applicationPriority, app.placementContext));
{code}
Couple of suggestions:
a. Could we save *placementContext* under app data in statestore?
b. While recomputing *placeApplication*, could we bypass some api calls from {{PlacementManager}}
as we already have the mapped queue name?


2 Could we optimize {{addApplicationOnRecovery}} in CS further? Multiple if checks are a bit
confusing. May be we can create {{getQueueWithMappings}} and instead of calling getQueue from
addApplication/OnRecovery, we can getQueue and do mapping if needed. A bit if refactoring
only.



was (Author: sunilg):
Thanks [~suma.shivaprasad]. Some comments here.
#
{code}
  void replaceQueueFromPlacementContext(
      ApplicationPlacementContext placementContext,
      ApplicationSubmissionContext context) {
    // Set it to ApplicationSubmissionContext
    //apply queue mapping only to new application submissions
    if (placementContext != null && !StringUtils.equalsIgnoreCase(
        context.getQueue(), placementContext.getQueue())) {
      LOG.info("Placed application=" + context.getApplicationId() +
          " to queue=" + placementContext.getQueue() + ", original queue="
          + context
          .getQueue());
      context.setQueue(placementContext.getQueue());
    }
  }
{code}
Queue after placement is already updated in submission context during application submission.
So while recovery, we already have the mapped queue name. Hence {{UserGroupMappingPlacementRule.getPlacementForApp}}
will have correct mapped queue name, but still we redo same action. Ideally the current issue
has happened because below event has to be fired from RMAppImpl to Scheduler and *placementContext*
will be null in current case of recovery (this might break for normal user-mapping also?).
{code}
      app.scheduler.handle(
          new AppAddedSchedulerEvent(app.user, app.submissionContext, true,
              app.applicationPriority, app.placementContext));
{code}
Couple of suggestions:
1. Could we save *placementContext* under app data in statestore?
2. While recomputing *placeApplication*, could we bypass some api calls from {{PlacementManager}}
as we already have the mapped queue name?

# Could we optimize {{addApplicationOnRecovery}} in CS further? Multiple if checks are a bit
confusing. May be we can create {{getQueueWithMappings}} and instead of calling getQueue from
addApplication/OnRecovery, we can getQueue and do mapping if needed. A bit if refactoring
only.


> Handle recovery of applications on auto-created leaf queues
> -----------------------------------------------------------
>
>                 Key: YARN-7643
>                 URL: https://issues.apache.org/jira/browse/YARN-7643
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>         Attachments: YARN-7643.1.patch, YARN-7643.2.patch
>
>
> CapacityScheduler application recovery should auto-create leaf queue if it doesnt exist.
Also RMAppManager needs to set the queue-mapping placement context so that scheduler has necessary
placement context to recreate the queue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message