hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
Date Mon, 31 Aug 2015 19:20:46 GMT

     [ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Eric Payne updated YARN-3769:
    Attachment: YARN-3769.001.branch-2.8.patch

One thing I've thought for a while is adding a "lazy preemption" mechanism, which is: when
a container is marked preempted and wait for max_wait_before_time, it becomes a "can_be_killed"
container. If there's another queue can allocate on a node with "can_be_killed" container,
such container will be killed immediately to make room the new containers.

I will upload a design doc shortly for review.

[~leftnoteasy], because it's been a couple of months since the last activity on this JIRA,
would it be better to use this JIRA for the purpose of making the preemption monitor "user-limit"
aware and open a separate JIRA to address a redesign?

Towards that end, I am uploading a couple of patches:
- {{YARN-3769.001.branch-2.7.patch}} is a patch to 2.7 (and also 2.6) which we have been using
internally. This fix has dramatically reduced the instances of "ping-pong"-ing as I outlined
in [the comment above|https://issues.apache.org/jira/browse/YARN-3769?focusedCommentId=14573619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14573619].

- {{YARN-3769.001.branch-2.8.patch}} is similar to the fix made in 2.7, but it also takes
into consideration node label partitions.
Thanks for your help and please let me know what you think.

> Preemption occurring unnecessarily because preemption doesn't consider user limit
> ---------------------------------------------------------------------------------
>                 Key: YARN-3769
>                 URL: https://issues.apache.org/jira/browse/YARN-3769
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0
>            Reporter: Eric Payne
>            Assignee: Wangda Tan
>         Attachments: YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch
> We are seeing the preemption monitor preempting containers from queue A and then seeing
the capacity scheduler giving them immediately back to queue A. This happens quite often and
causes a lot of churn.

This message was sent by Atlassian JIRA

View raw message