hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
Date Tue, 19 Jan 2016 17:02:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107009#comment-15107009

Nathan Roberts commented on YARN-1011:

bq. Welcome any thoughts/suggestions on handling promotion if we allow applications to ask
for only guaranteed containers. I ll continue brain-storming. We want to have a simple mechanism,
if possible; complex protocols seem to find a way to hoard bugs.

I agree that we want something simple and this probably doesn’t qualify, but below are some
thoughts anyway. 

This seems like a difficult problem. Maybe a webex would make sense at some point to go over
the design and work through some of these issues????

Maybe we need to run two schedulers, conceptually anyway. One of them is exactly what we have
today, call it the “GUARANTEED” scheduler. The second one is responsible for the “OPPORTUNISTIC”
space. What I like about this sort of approach is that we aren’t changing the way the GUARANTEED
scheduler would do things. The GUARANTEED scheduler assigns containers in the same order as
it always has, regardless of whether or not opportunistic containers are being allocated in
the background. By having separate schedulers, we’re not perturbing the way user_limits,
capacity limits, reservations, preemption, and other scheduler-specific fairness algorithms
deal with opportunistic capacity (I’m concerned we’ll have lots of bugs in this area).
The only difference is that the OPPORTUNISTIC side might already be running a container when
the GUARANTEED scheduler gets around to the same piece of work (the promotion problem). What
I don't like is that it's obviously not simple.
- The OPPORTUNISTIC scheduler could behave very differently from the GUARANTEED scheduler
(e.g. it could only consider applications in certain queues, it could heavily favor applications
with quick running containers, it could randomly select applications to fairly use OPPORTUNISTIC
space, it could ignore reservations, it could ignore user limits, it could work extra hard
to get good container locality, etc.)
- When the OPPORTUNISTIC scheduler launches a container, it modifies the ask to indicate this
portion has been launched opportunistically, the size of the ask does not change (this means
the application needs to be aware that it is launching an OPPORTUNISTIC container) 
- Like Bikas already mentioned, we have to promote opportunistic containers, even if it means
shooting an opportunistic one and launching a guaranteed one somewhere else.
- If the GUARANTEED scheduler decides to assign a container y to a portion of an ask that
has already been opportunistically launched with container x, the AM is asked to migrate container
x to container y. If x and y are on the same host, great, the AM asks the NM to convert x
to y (mostly bookkeeping); if not the AM kills x and launches y. Probably need a new state
to track the migration.
- Maybe locality would make the killing of opportunistic containers a rare event? If both
schedulers are working hard to get locality (e.g. YARN-80 gets us to about 80% node local),
then it seems like the GUARANTEED scheduler is going to usually pick the same nodes as the
OPPORTUNISTIC scheduler, resulting in very simple container conversions with no lost work.
- I don’t see how we can get away from occasionally shooting an opportunistic container
so that a guaranteed one can run somewhere else. Given that we want opportunistic space to
be used for both SLA and non-SLA work, we can’t wait around for a low priority opportunistic
container on a busy node. Ideally the OPPORTUNISTIC scheduler would be good at picking containers
that almost never get shot. 
- When the GUARANTEED scheduler assigns a container to a node, the over-allocate thresholds
could be violated, in this case OPPORTUNISTIC containers on the node need to be shot.  It
would be good if this didn’t happen if a simple conversion was going to occur anyway. 

Given the complexities of this problem, we're going to experiment with a simpler approach
of over-allocating up-to 2-3X on memory with the NM shooting containers (preemptable containers
first) when resources are dangerously low. The over-allocate will be dynamic based on current
node usage (when node is idle, no over-allocate; basically there has to be some evidence that
 over-allocating will be successful before we actually over-allocate). This type of approach
might not satisfy all use cases but it might turn out to be very simple and mostly effective.
We'll report back on how this type of approach works out.

> [Umbrella] Schedule containers based on utilization of currently allocated containers
> -------------------------------------------------------------------------------------
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf
> Currently RM allocates containers and assumes resources allocated are utilized.
> RM can, and should, get to a point where it measures utilization of allocated containers
and, if appropriate, allocate more (speculative?) containers.

This message was sent by Atlassian JIRA

View raw message