hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
Date Thu, 03 Sep 2015 14:06:47 GMT

    [ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729100#comment-14729100

Jason Lowe commented on YARN-4108:

For the case where we are preempting containers but the other job/queue cannot take them because
of user limit: that's clearly a bug in the preemption logic and not related to the problem
of matching up preemption events to pending asks so we satisfy the preemption trigger.  We
should never be preempting containers if a user limit would foil our ability to reassign those
resources.  If we are then the preemption logic is miscalculating the amount of pending demand
that triggered the preemption in the first place.

For the case where the pending ask that is triggering the preemption has very strict and narrow
locality requirements: yeah, that's a tough one.  If the locality requirement can be relaxed
then it's not too difficult -- by the time we preempt we'll have given up on looking for locality
by then.  However if the locality requirement cannot be relaxed then preemption could easily
thrash wildly if the resources that can be preempted do not satisfy the pending ask.  We would
need to be very concious of the request we're trying to satisfy -- preemption may not be able
to satisfy the request at all in some cases.

I was thinking along the reservation lines as well.  When we are trying to satisfy a request
on a busy cluster we already make a reservation on a node.  When we decide to preempt we can
move the request's reservation to the node where we decided to preempt containers.  The problem
is that we are now changing the algorithm for deciding what gets shot -- it used to be least
amount of work lost, but now with locality introduced into the equation there needs to be
a weighting of container duration and locality in the mix.

This would be a lot more straightforward if the scheduler wasn't trying to peephole optimize
by only looking at one node at a time when it schedules.  If the scheduler could look across
nodes and figure out which node "wins" in terms of sufficiently preemptable resources with
the lowest cost of preemption then it could the send the preemption requests/kills to the
containers on that node and move the reservation to that node.  Looking at only one node at
a time means we may have to do "scheduling opportunity" hacks to let it see enough nodes to
make a good decision.

> CapacityScheduler: Improve preemption to preempt only those containers that would satisfy
the incoming request
> --------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4108
>                 URL: https://issues.apache.org/jira/browse/YARN-4108
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
> This is sibling JIRA for YARN-2154. We should make sure container preemption is more
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality (I only
want to use rack-1) / node-constraints (YARN-3409) / black-list (I don't want to use rack1
and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), cross applicaiton
preemption (such as priority-based (YARN-1963) / fairness-based (YARN-3319)).

This message was sent by Atlassian JIRA

View raw message