hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7149) Cross-queue preemption sometimes starves an underserved queue
Date Wed, 06 Sep 2017 14:53:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155474#comment-16155474
] 

Jason Lowe commented on YARN-7149:
----------------------------------

Thanks for the report and analysis, Eric!  So it appears YARN-5889's change to try to balance
the growth of users violated the preemption monitor's forecasting of resource assignments.
 One way to fix this is to change the preemption monitor's forecasting calculations to use
the old user limit calculations, however I'm wondering if we should revisit the decision to
change the user limit calculations in YARN-5889.

I understand the desire to try to balance user growth, but it seems like this is going to
significantly slow container assignment when there are multiple active users to solve a problem
that I'm not sure is a real problem in practice.  If I understand the concern properly, we
want to try to avoid a situation where one user can quickly rush ahead to their full user
limit, well ahead of the other users, and then before the other users get to their same limit
something happens (e.g.: more users become active, cluster loses capacity, etc.).  That window
should be very small in practice (i.e.: a few seconds to a few tens of seconds) because the
user limit should reflect capacity that is available right now.  The speed at which the user
limit is reached should only be limited by the heartbeat rate of the nodes and how picky the
container requests are.

I'm concerned about the new approach because it looks like it will significantly slow down
container assignments.  For example there are two users, A and B, each with a single active
application that is asking for many more containers than the queue can provide.  User A's
app is ahead of user B's app in the queue, and the queue is initially almost empty.  Before
the user limit change, the user limits for each user would be 50% since they are the only
two active users in the queue.  As nodes heartbeat into the scheduler, the scheduler would
aggressively assign containers, likely more than one for each heartbeat, to user A until the
50% user limit is reached.  At that point it would switch to assigning containers to user
B, again likely more than one per node heartbeat.  Unless the container requests are very
picky, it should only take two rounds or so of node heartbeats to satisfy both users which
should only be  a small number of seconds.  With the new limit calculation, the user limits
for A and B are going to be only the minimal increment over what they're using.  Therefore
each node heartbeat will only assign one container to each user rather than multiple since
it will keep running into the user limit before it grows.  The end result is it will take
a lot more node heartbeats to get everything assigned.  That will be perceived as a slow scheduler
to users.  Do we really need to keep the assignments balanced as users grow to their limit?
 It looks like it will be a significant performance hit to do so since we will keep hitting
the limit on each node heartbeat, cutting short the number of containers we would normally
assign per heartbeat.

> Cross-queue preemption sometimes starves an underserved queue
> -------------------------------------------------------------
>
>                 Key: YARN-7149
>                 URL: https://issues.apache.org/jira/browse/YARN-7149
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.9.0, 3.0.0-alpha3
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> In branch 2 and trunk, I am consistently seeing some use cases where cross-queue preemption
does not happen when it should. I do not see this in branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit Percent (MULP)*
| *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far underserved_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message