hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4059) Preemption should delay assignments back to the preempted queue
Date Thu, 03 Sep 2015 00:46:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728308#comment-14728308

Wangda Tan commented on YARN-4059:

Thanks [~lichangleo].

I agree with [~jlowe], maybe we need to modify scheduler/preemption-policy together to make
this can be better handled. I posted few points to YARN-4108.

For the locality wait issue, I think it's more caused by how we calculate missed-opportunity
instead of preemption policy. Currently, missed-opportunity gets updated only when an application
accessed by scheduler. If a cluster is highly utilized, such as 99% of nodes are occupied,
missed-opportunity increasing can be very very slow. IIRC, [~jlowe] mentioned this in other
JIRAs. Maybe we need to change heartbeat-based counting to time-based counting. [~jlowe],
do you think if it acceptable to you if adding an option to CS choose to use heartbeat-based
counting or time-based counting?

> Preemption should delay assignments back to the preempted queue
> ---------------------------------------------------------------
>                 Key: YARN-4059
>                 URL: https://issues.apache.org/jira/browse/YARN-4059
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Chang Li
>            Assignee: Chang Li
>         Attachments: YARN-4059.2.patch, YARN-4059.3.patch, YARN-4059.patch
> When preempting containers from a queue it can take a while for the other queues to fully
consume the resources that were freed up, due to delays waiting for better locality, etc.
Those delays can cause the resources to be assigned back to the preempted queue, and then
the preemption cycle continues.
> We should consider adding a delay, either based on node heartbeat counts or time, to
avoid granting containers to a queue that was recently preempted. The delay should be sufficient
to cover the cycles of the preemption monitor, so we won't try to assign containers in-between
preemption events for a queue.
> Worst-case scenario for assigning freed resources to other queues is when all the other
queues want no locality. No locality means only one container is assigned per heartbeat, so
we need to wait for the entire cluster heartbeating in times the number of containers that
could run on a single node.
> So the "penalty time" for a queue should be the max of either the preemption monitor
cycle time or the amount of time it takes to allocate the cluster with one container per heartbeat.
Guessing this will be somewhere around 2 minutes.

This message was sent by Atlassian JIRA

View raw message