hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
Date Tue, 05 Jan 2016 15:29:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083223#comment-15083223
] 

Karthik Kambatla edited comment on YARN-1011 at 1/5/16 3:28 PM:
----------------------------------------------------------------

We would run an opportunistic container on a node only if the actual utilization is less than
the allocation by a margin bigger than the allocation of said opportunistic container. We
reactively preempt the opportunistic container if the actual utilization goes over a threshold.
To address spikes in usage where our reactive measures are too slow to kick in, we run the
opportunistic containers at a strictly lower priority. 

bq. the app got opportunistic containers and their perf wasnt the same as normal containers
- so it ran slower. 
As soon as we realize the perf is slower because the node has higher usage than we had anticipated,
we preempt the container and retry allocation (guaranteed or opportunistic depending on the
new cluster state). So, it shouldn't run slower for longer than our monitoring interval. Is
this assumption okay? 

bq. However, things get complicated because a node with an opportunistic container may continue
to run its normal containers while space frees up for guaranteed capacity on other nodes.
The opportunistic container will continue to run on this node so long as it is getting the
resources it needs. If there is any sort of resource contention, it is preempted and is up
for allocation on one of the free nodes. 

bq. This would require that the system upgrade opportunistic containers in the same order
as it would allocate containers.
bq. IMO, the NM cannot make a local choice about upgrading its opportunistic containers because
this is effectively a resource allocation decision and only the RM has the info to do that.
The RM schedules the next highest priority "task" for which it couldn't find a guaranteed
container as an opportunistic container. This task continues to run as long as it is not getting
enough resources. If there is no resource contention, the task continues to run. If guaranteed
resources free up on the node it is running, isn't it fair to promote the container to Guaranteed.
After all, if the resources unused were not hidden behind other containers' allocation and
actually available as guaranteed capacity on that node initially, the RM would just have scheduled
a guaranteed container in the first place.

I should probably clarify that the proposal here targets those cases where users' estimates
are significantly off reality and there are enough free resources per node to run additional
task(s) without causing any resource contention. Even though this is the norm, we want to
guard against spikes in usage to avoid perf regressions. In practice, I expect admins to come
up with a reasonable threshold for over-subscription: e.g. 0.8 - we use only oversubscribe
upto 80% of capacity advertised through {{yarn.nodemanger.resource.*}}. Thinking more about
this, this threshold should have an upper limit - 0.95? 



was (Author: kasha):
We would run an opportunistic container on a node only if the actual utilization is less than
the allocation by a margin bigger than the allocation of said opportunistic container. We
reactively preempt the opportunistic container if the actual utilization goes over a threshold.
To address spikes in usage where our reactive measures are too slow to kick in, we run the
opportunistic containers at a strictly lower priority. 

bq. the app got opportunistic containers and their perf wasnt the same as normal containers
- so it ran slower. 
As soon as we realize the perf is slower because the node has higher usage than we had anticipated,
we preempt the container and retry allocation (guaranteed or opportunistic depending on the
new cluster state). So, it shouldn't run slower for longer than our monitoring interval. Is
this assumption okay? 

bq. However, things get complicated because a node with an opportunistic container may continue
to run its normal containers while space frees up for guaranteed capacity on other nodes.
The opportunistic container will continue to run on this node so long as it is getting the
resources it needs. If there is any sort of resource contention, it is preempted and is up
for allocation on one of the free nodes. 

bq. This would require that the system upgrade opportunistic containers in the same order
as it would allocate containers.
bq. IMO, the NM cannot make a local choice about upgrading its opportunistic containers because
this is effectively a resource allocation decision and only the RM has the info to do that.
The RM schedules the next highest priority "task" for which it couldn't find a guaranteed
container as an opportunistic container. This task continues to run as long as it is not getting
enough resources. If there is no resource contention, the task continues to run. If guaranteed
resources free up on the node it is running, isn't it fair to promote the container to Guaranteed.
After all, if the resources unused were not hidden behind other containers' allocation and
actually available as guaranteed capacity on that node initially, the RM would just have scheduled
a guaranteed container in the first place.

I should probably clarify that the proposal here targets those cases where users' estimates
are significantly off reality and there are enough free resources per node to run additional
task(s) without causing any resource contention. Even though this is the norm, we want to
guard against spikes in usage to avoid perf regressions. In practice, I expect admins to come
up with a reasonable threshold for over-subscription: e.g. 0.8 - we use only oversubscribe
upto 80% of capacity advertised through {{yarn.nodemanger.resource.*}}


> [Umbrella] Schedule containers based on utilization of currently allocated containers
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are utilized.
> RM can, and should, get to a point where it measures utilization of allocated containers
and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message