hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
Date Tue, 30 Jun 2015 23:55:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609278#comment-14609278
] 

Arun Suresh commented on YARN-3633:
-----------------------------------

I guess the line that was introduced needs to be synchronized (guess we need to do the same
for {{removeApp}} where we are subtracting).. given that you are adding/subtracting from "totalAmResourceUsage"
defined in the {{FairScheduler}}.. and considering that the {{Resources#addTo/subtractFrom}}
actually performs a get and set (and the value can change in between if some other AM is added/removed..
possibly during a concurrently running continuous scheduling attempt)

> With Fair Scheduler, cluster can logjam when there are too many queues
> ----------------------------------------------------------------------
>
>                 Key: YARN-3633
>                 URL: https://issues.apache.org/jira/browse/YARN-3633
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Rohit Agarwal
>            Assignee: Rohit Agarwal
>            Priority: Critical
>         Attachments: YARN-3633-1.patch, YARN-3633.patch
>
>
> It's possible to logjam a cluster by submitting many applications at once in different
queues.
> For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users
submit applications at the same time. The fair share of each queue is 5GB. Let's say that
maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested
AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources
are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message