hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kuhu Shukla (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster
Date Wed, 06 Jul 2016 15:47:11 GMT

     [ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Kuhu Shukla updated YARN-4280:
    Attachment: YARN-4280.007.patch

Thank you so much [~leftnoteasy] for the detailed review and offline explanation. I have rectified
the patch for Point#1, which subtracts max(child.headroom,none()) from parentLimits if QUEUE_SKIPPED
is received.

For point 2. I think it would still work as follows:

Given the queue configuration in the above example with all queues max-capacity=100%, when
the first QUEUE_SKIPPED is received from a1 to a, the parent limit for a will be set to (50-2)
since childlimits.getHeadroom will be 2. Now when {{getResourceLimitsOfChild}} is called with
parentLimits=48, the value of {{parentMaxAvailableResource}} will be zero and the childLimit
for a2 will be (0+24) which would inhibit a2 to go through with assignment request of 1. 

Let me know your thoughts/concerns regarding this. Thanks a lot!

> CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster
> ----------------------------------------------------------------------------------------
>                 Key: YARN-4280
>                 URL: https://issues.apache.org/jira/browse/YARN-4280
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.6.1, 2.8.0, 2.7.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-4280.001.patch, YARN-4280.002.patch, YARN-4280.003.patch, YARN-4280.004.patch,
YARN-4280.005.patch, YARN-4280.006.patch, YARN-4280.007.patch
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at total cluster
capacity. There are 2 applications, appX that runs on Queue A, always asking for 1G containers(non-AM)
and appY runs on Queue B asking for 2 GB containers.
> The user limit is high enough for the application to reach 100% of the cluster resource.

> appX is running at total cluster capacity, full with 1G containers releasing only one
container at a time. appY comes in with a request of 2GB container but only 1 GB is free.
Ideally, since appY is in the underserved queue, it has higher priority and should reserve
for its 2 GB request. Since this request puts the alloc+reserve above total capacity of the
cluster, reservation is not made. appX comes in with a 1GB request and since 1GB is still
available, the request is allocated. 
> This can continue indefinitely causing priority inversion.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message