hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
Date Tue, 30 Jun 2015 23:31:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609248#comment-14609248

Rohit Agarwal commented on YARN-3633:

Yes, the {{clusterMaxAMShare}} is acting as an upper limit.

To maintain the current behavior, we should keep the default {{clusterMaxAMShare}} as 0.5
Right now, the default for {{queueMaxAMShare}} is 0.5, which results in an implicit {{clusterMaxAMShare}}
of 0.5, this is because no queue allows more than 50% of its resources to be allocated to
AMs and hence no more than 50% of the cluster resources can be allocated to AMs.
With this change, queueMaxAMShare only restricts AMs when there is already at least one AM
running in the queue. So, {{clusterMaxAMShare}} is needed to avoid the cluster from getting
overrun with AMs (YARN-1913).

We should set {{clusterMaxAMShare}} to negative, only in those cases where we would set {{queueMaxAMShare}}
to negative - i.e. when we don't want to restrict AM usage.


Regarding synchronization, I am wondering why the existing line in addAMResourceUsage not
synchronized? Is this code called concurrently? Also, if I should synchronize, should I synchronize
the method or just the line I added?

> With Fair Scheduler, cluster can logjam when there are too many queues
> ----------------------------------------------------------------------
>                 Key: YARN-3633
>                 URL: https://issues.apache.org/jira/browse/YARN-3633
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Rohit Agarwal
>            Assignee: Rohit Agarwal
>            Priority: Critical
>         Attachments: YARN-3633-1.patch, YARN-3633.patch
> It's possible to logjam a cluster by submitting many applications at once in different
> For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users
submit applications at the same time. The fair share of each queue is 5GB. Let's say that
maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested
AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources
are available.

This message was sent by Atlassian JIRA

View raw message