hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
Date Thu, 08 Mar 2018 19:32:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391802#comment-16391802

Eric Payne commented on YARN-4606:

[~manirajv06@gmail.com], thank you for the patch. The overall approach looks fine, but I have
a couple of concerns.
 - The behavior of assigning resources to schedulable applications has changed. With this
patch, in the following use case, resources are not assigned to the second app when they should
be. I have not analyzed the behavior closely enough to debug the issue, but I wish to document
the behavior:
 -- Queue1 total resources: 40
 -- Queue1 Max Application Master Resources: 2
 -- Container sizes are all 1 resource
|*User Name*|*Applicatiton ID*|*Used AM resources*|*Total Used Resources*|*Pending Resources*|
|User2|App2|0|0|1 (waiting for AM)|

 -- In this scenario, User2 wants to start App2 but User1 is consuming all resources in the
queue with App1. When App1 releases a resource, however, it is not given to App2. The resource
is given back to App1, which brings its Pending value down to 19. This is incorrect behavior
since Queue1 has room for 2 AMs.

 - I think the {{TestRMHA}} unit test needs to be modified to adjust to this patch:
TestRMHA.testFailoverAndTransitions:219->verifyClusterMetrics:754 Incorrect value for metric
activeApplications expected:<1> but was:<0>
TestRMHA.testFailoverClearsRMContext:550->verifyClusterMetrics:754 Incorrect value for
metric activeApplications expected:<1> but was:<0>

 - A couple of minor things:
 -- IIUC, the value stored in {{activeUsersOfPendingApps}} represents the number of suers
that do not have any active applications. Is that correct? If so, I think it would be more
clear if it were called {{usersWithOnlyPendingApps}}.
 -- In {{AbstractUsersManager}} and {{ActiveUsersManager}}, *atleast* should be "at least*.

> CapacityScheduler: applications could get starved because computation of #activeUsers
considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
> Currently, if all applications belong to same user in LeafQueue are pending (caused by
max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This
could lead to starvation of active applications, for example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs
to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new resources.
So computed user-limit-resource could be lower than expected.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message