hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
Date Wed, 13 May 2015 23:18:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542895#comment-14542895
] 

Rohit Agarwal commented on YARN-3633:
-------------------------------------

Can you elaborate the scenario where the AMs would come up but the containers would not? There
is still a cluster-wide maxAMShare which is, let's say, 0.5 times the cluster-capacity.

A dry run of my proposed change:

Cluster Resource: 20GB
Applications submitted to 4 queues simultaneously - queue a, b, c, and d. Each application
requests AMs of size 3GB and containers of size 3GB.
Fair share for each queue = 5GB.
maxAMShare is 0.5
Let's say {{SomeMinimumSizeEnoughToRunOneContainer}} is 3GB.


queue a will start AM1. (maxAMShare = max(0.5*5GB, 3GB) = 3GB). cluster-wide AMShare = 3GB
< 0.5*20GB
queue b will start AM2. (maxAMShare = max(0.5*5GB, 3GB) = 3GB). cluster-wide AMShare = 6GB
< 0.5*20GB
queue c will start AM3. (maxAMShare = max(0.5*5GB, 3GB) = 3GB). cluster-wide AMShare = 9GB
< 0.5*20GB
FS will try to run AM4 on queue d. But now it would hit the cluster-wide maxAMShare limit.
So, nothing will run there. Then, FS will try to run something on queue a (or b or c) - and
so the application1 container would start.

This would repeat. FS will try to run AM4 on queue d. It would again hit the cluster-wide
maxAMShare limit. It would then try to run something on queue b (or c) - and so application2
container would start.

And so on.

Finally one of app1, app2, app3 AM would finish. At which time FS should schedule AM4 on queue
d.

I agree that queue d is not getting its fair share till one of app1, app2 and app3 complete
(and that is why I am unsure how my proposed change would work when preemption is enabled.)
But I think it is better than not scheduling anything?

> With Fair Scheduler, cluster can logjam when there are too many queues
> ----------------------------------------------------------------------
>
>                 Key: YARN-3633
>                 URL: https://issues.apache.org/jira/browse/YARN-3633
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Rohit Agarwal
>            Assignee: Rohit Agarwal
>            Priority: Critical
>
> It's possible to logjam a cluster by submitting many applications at once in different
queues.
> For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users
submit applications at the same time. The fair share of each queue is 5GB. Let's say that
maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested
AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources
are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message