hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6960) definition of active queue allows idle long-running apps to distort fair shares
Date Sun, 20 Aug 2017 12:59:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134416#comment-16134416

Steven Rand commented on YARN-6960:

[~daniel@cloudera.com], I've uploaded a patch proposing a new definition of queue activity.
It also needs tests, but I wanted to first see how the community feels about this change,
and revise it as necessary based on feedback before writing tests for it.

My understanding of a queue's demand is that it's the cumulative current usage of all apps
in the queue plus the cumulative requested additional resources for all apps in the queue.
Therefore if no apps are requesting additional resources, the demand will be equal to the
usage of the AMs. Then, as soon as any app attempts to do anything, it's demand will be greater
than the AM usage, and the queue will become active.

I've tested this patch and it seems to have the desired effect. Going back to the example
in the description, {{root.c}} and {{root.d}} have equal fair shares despite the idle applications
in {{root.a}} and {{root.b}}.

> definition of active queue allows idle long-running apps to distort fair shares
> -------------------------------------------------------------------------------
>                 Key: YARN-6960
>                 URL: https://issues.apache.org/jira/browse/YARN-6960
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.8.1, 3.0.0-alpha4
>            Reporter: Steven Rand
>            Assignee: Steven Rand
>         Attachments: YARN-6960.001.patch
> YARN-2026 introduced the notion of only considering active queues when computing the
fair share of each queue. The definition of an active queue is a queue with at least one runnable
> {code}
>   public boolean isActive() {
>     return getNumRunnableApps() > 0;
>   }
> {code}
> One case that this definition of activity doesn't account for is that of long-running
applications that scale dynamically. Such an application might request many containers when
jobs are running, but scale down to very few containers, or only the AM container, when no
jobs are running.
> Even when such an application has scaled down to a negligible amount of demand and utilization,
the queue that it's in is still considered to be active, which defeats the purpose of YARN-2026.
For example, consider this scenario:
> 1. We have queues {{root.a}}, {{root.b}}, {{root.c}}, and {{root.d}}, all of which have
the same weight.
> 2. Queues {{root.a}} and {{root.b}} contain long-running applications that currently
have only one container each (the AM).
> 3. An application in queue {{root.c}} starts, and uses the whole cluster except for the
small amount in use by {{root.a}} and {{root.b}}. An application in {{root.d}} starts, and
has a high enough demand to be able to use half of the cluster. Because all four queues are
active, the app in {{root.d}} can only preempt the app in {{root.c}} up to roughly 25% of
the cluster's resources, while the app in {{root.c}} keeps about 75%.
> Ideally in this example, the app in {{root.d}} would be able to preempt the app in {{root.c}}
up to 50% of the cluster, which would be possible if the idle apps in {{root.a}} and {{root.b}}
didn't cause those queues to be considered active.
> One way to address this is to update the definition of an active queue to be a queue
containing 1 or more non-AM containers. This way if all apps in a queue scale down to only
the AM, other queues' fair shares aren't affected.
> The benefit of this approach is that it's quite simple. The downside is that it doesn't
account for apps that are idle and using almost no resources, but still have at least one
non-AM container.
> There are a couple of other options that seem plausible to me, but they're much more
complicated, and it seems to me that this proposal makes good progress while adding minimal
extra complexity.
> Does this seem like a reasonable change? I'm certainly open to better ideas as well.
> Thanks,
> Steve

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message