hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1495) Allow moving apps between queues
Date Fri, 13 Dec 2013 22:32:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848006#comment-13848006

Sandy Ryza commented on YARN-1495:

bq. We have to touch RMApp etc before hitting scheduler as state in RM is partitioned inside
and outside scheduler.
Sorry, I wasn't clear - definitely agree we need to go through RM app, just was wondering
whether to do it with events or synchronously.  Thanks for the heads up on the race condition
- will watch out for that.

bq. The paradigm followed is a multi-phase request
An issue with doing a multi-phase request is that, if the move fails, we would like to return
an appropriate error message with the reason to the client, and the reason can go as far down
as the scheduler.  We could give the client a request ID that they could come back with to
find the result, but that kind of seems like overkill to me.  While async/multi-phase requests
100% make sense to me in situations like the AMRM protocol where requests come in all the
time, moves will normally be human-initiated requests that come with very low frequency. 
I'll write the code with events, which will allow us to take either the blocking (with a Future)
or non-blocking approach.

> Allow moving apps between queues
> --------------------------------
>                 Key: YARN-1495
>                 URL: https://issues.apache.org/jira/browse/YARN-1495
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 2.2.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
> This is an umbrella JIRA for work needed to allow moving YARN applications from one queue
to another.  The work will consist of additions in the command line options, additions in
the client RM protocol, and changes in the schedulers to support this.
> I have a picture of how this should function in the Fair Scheduler, but I'm not familiar
enough with the Capacity Scheduler for the same there.  Ultimately, the decision to whether
an application can be moved should go down to the scheduler - some schedulers may wish not
to support this at all.  However, schedulers that do support it should share some common semantics
around ACLs and what happens to running containers.
> Here is how I see the general semantics working out:
> * A move request is issued by the client.  After it gets past ACLs, the scheduler checks
whether executing the move will violate any constraints. For the Fair Scheduler, these would
be queue maxRunningApps and queue maxResources constraints
> * All running containers are transferred from the old queue to the new queue
> * All outstanding requests are transferred from the old queue to the new queue
> Here is I see the ACLs of this working out:
> * To move an app from a queue a user must have modify access on the app or administer
access on the queue
> * To move an app to a queue a user must have submit access on the queue or administer
access on the queue 

This message was sent by Atlassian JIRA

View raw message