hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
Date Mon, 12 Aug 2013 15:50:50 GMT

    [ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736970#comment-13736970

Carlo Curino commented on YARN-624:

Robert, you are right, and provide a compelling example of an application that has dynamic
needs for resources. 

There are ways around this, where you dynamically negotiate an increase/decrease of dedicated
resources, and keep 
the AM as it is. Philosophically this keeps all interaction AM-RM as best-effort partial-ok,
while is the client-RM 
protocol that talks about binding negotiation for resources. This would work and match well
the current preemption 
mechanics, but I am not sure it is the best design (I haven't thought hard about it yet).

If we go with the design where the AM makes gang-like requests, we should make the preemption
policy aware of
this, and act accordingly. In a sense, this boils down to a "granularity" problem, not too
different from the current
size of containers to preempt vs needed capacity. But it stretches the precision issue by
potentially a huge factor, making 
the tradeoff between under and over preempting a more subtle line to walk. 

Two ways around this:
* we might want introduce non-strictly FIFO preemptions in a queue, i.e., skip a large gang
and preempt containers from the 
next app if the gang is way bigger than my preemption needs. This risks to break reservations,
and has possibly funny and 
gameable semantics. Also it seems hard to gain experience on how to parametrize such heuristics.

* an alternative workaround is to ensure that no gang requests are satisfied with over-capacity
containers, this keeps the
gangs out of the preemption radar. A simple way to enforce this is to set max-capacity the
same as guaranteed capacity for 
the queues that will serve gang requests. (This might combine nicely with the dynamic negotiation
business as well).

Another sub-problem of gang-scheduling is to track which containers belong to which gang (and/or
which requests they serve). 
This also requires the AM to be consistent in how it uses containers it receives and possibly
a more explicit protocol to 
say "this container I am giving you is part of that gang request", otherwise a single preemption
might break multiple topologies. 
In general this containers-to-requests tracking seems a bit too opaque at the moment (I have
heard independent complaints from 
ApplicationMaster developers on this before).
> Support gang scheduling in the AM RM protocol
> ---------------------------------------------
>                 Key: YARN-624
>                 URL: https://issues.apache.org/jira/browse/YARN-624
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, scheduler
>    Affects Versions: 2.0.4-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs
a set of tasks when they can all be run at the same time, would be a useful feature for YARN
schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they get all the
ones they need.  However, this lends itself to deadlocks when different AMs are waiting on
the same containers.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message