hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manikandan R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
Date Thu, 29 Mar 2018 11:19:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418782#comment-16418782

Manikandan R commented on YARN-4606:

[~eepayne] Thanks for your detailed explanation. Sorry for the delay.
{quote}In this scenario, User2 wants to start App2 but User1 is consuming all resources in
the queue with App1. When App1 releases a resource, however, it is not given to App2. The
resource is given back to App1, which brings its Pending value down to 19. This is incorrect
behavior since Queue1 has room for 2 AMs.{quote}
I was trying to understand this behaviour in current code (without my patch) and come to know
that AM container is being allocated to App2 only after App1 completion when cluster is running

In my single node pseudo setup, total cluster resources is 8192M, 8 vcores, only 1 queue (default)
with 100% allocation and max am resources is 2048MB, 2 vcores as max am resource percent is
0.2. I submitted an app (say App1) through DS with num_containers as 20. While App1 is running
and its pending containers is around 15, submitted second app (say App2) with num_containers
as 10. I can see AM container for App2 is being allocated only after App1 completion, which
is not in line with your earlier comments. Am I missing anything here?
{quote}However, I'm not sure of the best way to get the values for a queue's Used AM Resources
and Max AM Resources from this context. Those may be capacity scheduler-specific values.
Yes. But I do see some equivalents available in {{FSQueueMetrics}}.

> CapacityScheduler: applications could get starved because computation of #activeUsers
considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
> Currently, if all applications belong to same user in LeafQueue are pending (caused by
max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This
could lead to starvation of active applications, for example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs
to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new resources.
So computed user-limit-resource could be lower than expected.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message