hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster
Date Fri, 23 Oct 2015 15:25:28 GMT

    [ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971154#comment-14971154

Jason Lowe commented on YARN-4280:

It's a sticky problem.  The problem with doing the resource check is that it can prevent the
reservation from being fulfilled indefinitely.  For example, consider a situation like this:
* root queue (near 100% utilization)
** parent queue P (near max capacity)
*** leaf queue A (well under capacity)
*** leaf queue B (almost all of P's utilization)
** leaf queue C (the remainder of root - P)

We have an application X in queue A that needs a large resource.  If we do a limit check against
P's max capacity or the root's max capacity, it won't fit.  If we don't make the reservation,
then the app in A could be indefinitely postponed.  So let's say we go ahead and let the reservation
occur.  If the resource to fill that reservation was freed from within the P queue hierarchy
then we're OK.  If it's not, then we cannot fulfill the reservation otherwise we run over
P's max capacity.  So in the latter case, do we leave the reservation?  Does this in turn
prevent apps in C from making progress because app X's reservations start locking down the
cluster, waiting for the apps in queue B to free up resources?

Offhand I don't have a great answer for how to tackle the problem.  Seems like either we need
to start locking down parts of the cluster and potentially leave resources fallow, even for
other queues outside of P, to make sure app X will eventually get something or we keep app
X from reserving and leave it vulnerable to indefinite postponement despite containers churning
in queue B.  It's like we need to make a reservation _within_ the P queue hierarchy for this
scenario, to make sure queue B isn't allowed to grab more resources while app X is waiting,
but not sure that's right either.

> CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster
> ----------------------------------------------------------------------------------------
>                 Key: YARN-4280
>                 URL: https://issues.apache.org/jira/browse/YARN-4280
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.6.1, 2.8.0, 2.7.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at total cluster
capacity. There are 2 applications, appX that runs on Queue A, always asking for 1G containers(non-AM)
and appY runs on Queue B asking for 2 GB containers.
> The user limit is high enough for the application to reach 100% of the cluster resource.

> appX is running at total cluster capacity, full with 1G containers releasing only one
container at a time. appY comes in with a request of 2GB container but only 1 GB is free.
Ideally, since appY is in the underserved queue, it has higher priority and should reserve
for its 2 GB request. Since this request puts the alloc+reserve above total capacity of the
cluster, reservation is not made. appX comes in with a 1GB request and since 1GB is still
available, the request is allocated. 
> This can continue indefinitely causing priority inversion.

This message was sent by Atlassian JIRA

View raw message