hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
Date Wed, 06 Jan 2016 15:28:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085677#comment-15085677

Nathan Roberts commented on YARN-1011:

bq. This is one of the reasons I was proposing the notion of a max threshold which is less
than 1 If the utilization goes to 100%, we clearly know there is contention. Since we measure
resource utilization in resource-seconds (if not, we should update it), bursty spikes alone
wouldn't take utilization over 100%. So, we shouldn't see a utilization greater than 100%.

Just to make sure I understand. When you say max threshold < 1 are you saying an NM could
not advertise 48 vcores if there are only 24 vcores physically available? I think we have
to support going above 1.0. We already go above 1.0 on our clusters, even without this feature.
What I'm thinking this feature will allow us to do is to go significantly above 1.0, especially
for resources like memory where we have to be much more careful about not hitting 100%. 

One use case that I'm really hoping this feature can support is a batch cluster (loose SLAs)
with very high utilization. For this use case, I'd like the following to be true:
- nodes can be at 100% CPU, 100% Network, or 100% Disk for long periods of time (several minutes).
Memory could get to something like 80% before corrective action would be required. During
these periods, no containers get shot to shed load. Nodemanagers might reduce their available
resource advertised to the RM, but nothing would need to be killed.
- Both GUARANTEED and OPPORTUNISTIC containers get their fair share of resources. They're
both drawing from the same capacity and user-limit from the RM's point of view so I feel like
they should be given their fair set of resources on the nodes they execute on. The real point
of being designated OPPORTUNISTIC in this use case is that the NM knows which containers to
kill when it needs to shed load.  

Another use case is where you have a mixture of jobs, some with tight SLAs, some with looser
SLAs. This one is mentioned in previous comments and is also very important. It requires a
different set of thresholds and a different level of fairness controls. 

So, I just think things have to be configurable enough to handle both types of clusters. 

> [Umbrella] Schedule containers based on utilization of currently allocated containers
> -------------------------------------------------------------------------------------
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
> Currently RM allocates containers and assumes resources allocated are utilized.
> RM can, and should, get to a point where it measures utilization of allocated containers
and, if appropriate, allocate more (speculative?) containers.

This message was sent by Atlassian JIRA

View raw message