hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-4781) Support intra-queue preemption for fairness ordering policy.
Date Thu, 01 Mar 2018 23:25:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382847#comment-16382847
] 

Eric Payne edited comment on YARN-4781 at 3/1/18 11:24 PM:
-----------------------------------------------------------

A lot has happened since this JIRA was opened, but I think there is still value in pursuing
the original intent. That is, intra-queue preemption should consider FairOrderingPolicy.
{quote}Currently, if a job in queue A is using 100% of the cluster resources, and a new job
arrives in queue A, it sometimes cannot even get an application master!
{quote}
{quote}one big query is taking all resources of a queue lets say Q1. And when i am launching
another query in Q1, almost always it is hanging in ACCEPTED
{quote}
[~milesc] and [~anuptiwari], I think this use case is covered by YARN-2009 and related JIRAs.
I think this JIRA covers a slightly different use case.

FairOrderingPolicy tries to evenly assign containers across users and across apps within a
user (as long as the user is below the user limit). Currently, the FairOrderingPolicy does
not honor application priority AFAICT.

We have seen the following use case in a large and extremely busy queue where we have FairOrderingPolicy
set, one user takes up a lot of the queue, and then other, later users, fight for the remaining
resources, with the youngest users / apps getting constantly preempted while the larger, older
user is not preempted.

For example,
 QueueA: minimum-user-limit-percent = 25
 QueueA: resources = 1000
| |Used|Pending|Preempted|
|User1 / App1|400|0|0|
|User2 / App2|300|0|0|
|User3 / App3|300|0|0|
|User4 / App4|0|100|0|
 - Intra-queue preemption preempts 50 from App2 and 50 from App3.

| |Used|Pending|Preempted|
|User1 / App1|400|0|0|
|User2 / App2|250|0|50|
|User3 / App3|250|0|50|
|User4 / App4|100|0|0|
 - App3 finishes and resources are given back to App2 and App3.

| |Used|Pending|Preempted|
|User1 / App1|400|0|0|
|User2 / App2|300|0|50|
|User3 / App3|300|0|50|
 - Then, User4 submits App5, and the process repeates.

| |Used|Pending|Preempted|
|User1 / App1|400|0|0|
|User2 / App2|250|0|100|
|User3 / App3|250|0|100|
|User4 / App5|100|0|0|

Then, while all 4 users have running apps, User5 comes along and can't get any resources,
they see that User1 is using 62% more resources than everyone else, and wonders why they can't
get any resources. (yes, I recognize the reason in this case is because MULP = 25%, but I'm
trying to make the user case simple).

This is somewhat simplified because in our case, we have up to 50 active users, and since
the queue is large, the difference between the largest user and the others is even more apparent.

 

[~sunilg] and [~leftnoteasy], Thoughts?


was (Author: eepayne):
A lot has happened since this JIRA was opened, but I think there is still value in pursuing
the original intent. That is, intra-queue preemption should consider FairOrderingPolicy.
{quote}Currently, if a job in queue A is using 100% of the cluster resources, and a new job
arrives in queue A, it sometimes cannot even get an application master!
{quote}
{quote}one big query is taking all resources of a queue lets say Q1. And when i am launching
another query in Q1, almost always it is hanging in ACCEPTED
{quote}
[~milesc] and [~anuptiwari], I think this use case is covered by YARN-2009 and related JIRAs.
I think this JIRA covers a slightly different use case.

FairOrderingPolicy tries to evenly assign containers across users and across apps within a
user (as long as the user is below the user limit). Currently, the FairOrderingPolicy does
not honor application priority AFAICT.

We have seen the following use case in a large and extremely busy queue where we have FairOrderingPolicy
set, one user takes up a lot of the queue, and then other, later users, fight for the remaining
resources, with the youngest users / apps getting constantly preempted while the larger, older
user is not preempted.

For example,
 QueueA: minimum-user-limit-percent = 25
 QueueA: resources = 1000
| |Used|Pending|Preempted|
|User1 / App1|400|0|0|
|User2 / App2|300|0|0|
|User3 / App3|300|0|0|
|User4 / App4|0|100|0|
 - Intra-queue preemption preempts 50 from App2 and 50 from App3.

| |Used|Pending|Preempted|
|User1 / App1|400|0|0|
|User2 / App2|250|0|50|
|User3 / App3|250|0|50|
|User4 / App4|100|0|0|
 - App3 finishes and resources are given back to App2 and App3.

| |Used|Pending|Preempted|
|User1 / App1|400|0|0|
|User2 / App2|300|0|50|
|User3 / App3|300|0|50|
 - Then, User4 submits App5, and the process repeates.

| |Used|Pending|Preempted|
|User1 / App1|400|0|0|
|User2 / App2|250|0|100|
|User3 / App3|250|0|100|
|User4 / App5|100|0|0|

Then, while all 4 users have running apps, User5 comes along and can't get any resources,
they see that User1 is using 62% more resources than everyone else, and wonders why they can't
get any resources. (yes, I recognize the reason in this case is because MULP = 25%, but I'm
trying to make the user case simple).

This is somewhat simplified because in our case, we have up to 50 active users, and since
the queue is large, the difference between the largest user and the others is even more apparent.

> Support intra-queue preemption for fairness ordering policy.
> ------------------------------------------------------------
>
>                 Key: YARN-4781
>                 URL: https://issues.apache.org/jira/browse/YARN-4781
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Major
>
> We introduced fairness queue policy since YARN-3319, which will let large applications
make progresses and not starve small applications. However, if a large application takes the
queue’s resources, and containers of the large app has long lifespan, small applications
could still wait for resources for long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to preempt resources
of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message