hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
Date Tue, 05 Jan 2016 21:48:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083880#comment-15083880

Nathan Roberts commented on YARN-1011:

Very excited about this feature and agree that we should make this as simple as possible in
the first go around. I have a couple of initial questions. 

bq. As soon as we realize the perf is slower because the node has higher usage than we had
anticipated, we preempt the container and retry allocation (guaranteed or opportunistic depending
on the new cluster state). So, it shouldn't run slower for longer than our monitoring interval.
Is this assumption okay?

This seems hard. ([~bikassaha] comment above). 

All of this basically boils down to the fact that preempting a container means lost work,
so the decision to preempt something shouldn't be taken lightly. For resources like memory
we have to react quickly, and that's fine. But for things like CPU, I'm personally ok with
latency on the order of single digit minutes so that natural container churn almost always
avoids preemption.

Because of that complexity, I'm not 100% convinced that disfavoring OPPORTUNISTIC containers
(e.g. low value for cpu_shares) is something that buys us very much. If we were oversubscribing
10X then I'd probably want it for sure, but if it's at most 2X capacity then worst case is
a container only gets 50% of the resource it had requested. Obviously for something like memory
this has to be closely controlled because going over the physical capabilities of the machine
has very significant consequences. But for CPU, I'd definitely be inclined to live with the
occasional 50% worst case for all containers, in order to avoid the 1/1024th worst case for
OPPORTUNISTIC containers on a busy node.

So, hopefully we can make the policy quite configurable so that the amount of disfavoring
can be tuned for various workloads.

bq. In practice, I expect admins to come up with a reasonable threshold for over-subscription:
e.g. 0.8 - we use only oversubscribe upto 80% of capacity advertised through yarn.nodemanger.resource.*.
Thinking more about this, this threshold should have an upper limit - 0.95? 
Can we make this per-resource? (80% memory, 120% CPU)?

> [Umbrella] Schedule containers based on utilization of currently allocated containers
> -------------------------------------------------------------------------------------
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
> Currently RM allocates containers and assumes resources allocated are utilized.
> RM can, and should, get to a point where it measures utilization of allocated containers
and, if appropriate, allocate more (speculative?) containers.

This message was sent by Atlassian JIRA

View raw message