hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
Date Tue, 13 May 2014 11:30:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996282#comment-13996282

Carlo Curino commented on YARN-2022:


The problem with AM_CONTAINER_PRIORITY is that it is just a short cut for setting Priority
= 0; The use can easily do so from its own code, and unless there are explicit checks that
prevent ResourceRequest to assign priority = 0 to all their containers, we have no defense
against user abuses. The two options I see are:
 * we track which container is the AM not via Priority and protect the AM container from preemption
whenever possible 
 * we assign a "quota" of protected-from-preemption containers, and save whichever containers
have the lowest priority and fit within the "quota". This way the user can specify multiple
containers at Priority=0 (think a replicated-AM or some other critical service for the job)
and we will save as many of those as it fits in the quota.

I think we are agreeing on max-am-percentage... the final goal is to make sure that after
preemption the max-am-resource-percent is respected (i.e., no more than a certain amount of
the queue is dedicated to AMs).

The problem with user-limit-factor goes like this:  
 * Given a queue A of capacity: 10%, max-capacity = 50%, and user-limit-factor = 2 (i.e.,
a single user can go up to 20% of total resources)
 * Only one user is active in this queue and it gets 20% of resources (this also require low
activity in other queues)
 * The overall cluster capacity is reduced (e.g., a failing rack) or a refresh of the queues
as reduced this queue capacity 
 * The LeafQueue scheduler keeps "skipping" the scheduling for this user (since he is now
over its user-limit-factor) although no other user in the cluster is asking for resources
  * If we ever get to this situation with the user holding only AMs the system is completely
wedged, with the AMs waiting for more containers, and the system systematically skipping this
user (as he is above its user-limit-factor).
If preemption proceeds systematically killing resources *including* AMs, the chances of this
happening are rather low (the "head" of the queue is only AMs, while the tail contained AMs
and other containers), but as we "save" AMs from preemption, this bad corner case is maybe
a little more likely to happen. 

What I am trying to affect with my comments is that as we try to evolve preemption further,
we should look at all the invariants of a queue, and try to make sure that our preemption
policy can re-establish not only the capacity invariant but also all others invariants. The
CS relies on those invariants heavily, and misbehave if they are violated.  An example of
this is YARN-1957, where we introduce better handling for max-capacity and zero-size queues.

The changes you are proposing are not "creating" the problem, just making it more likely to
happen in practice. A well tuned CS and reasonable load are unlikely to trigger this, but
we should build for robustness as much as possible, since we cannot rely on users to understand
this internals and tune the CS defensively.

[~acmurthy] any thoughts on this?

> Preempting an Application Master container can be kept as least priority when multiple
applications are marked for preemption by ProportionalCapacityPreemptionPolicy
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-2022
>                 URL: https://issues.apache.org/jira/browse/YARN-2022
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: Yarn-2022.1.patch
> Cluster Size = 16GB [2NM's]
> Queue A Capacity = 50%
> Queue B Capacity = 50%
> Consider there are 3 applications running in Queue A which has taken the full cluster
> J1 = 2GB AM + 1GB * 4 Maps
> J2 = 2GB AM + 1GB * 4 Maps
> J3 = 2GB AM + 1GB * 2 Maps
> Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
> Currently in this scenario, Jobs J3 will get killed including its AM.
> It is better if AM can be given least priority among multiple applications. In this same
scenario, map tasks from J3 and J2 can be preempted.
> Later when cluster is free, maps can be allocated to these Jobs.

This message was sent by Atlassian JIRA

View raw message