hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
Date Mon, 25 Jun 2018 21:05:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522811#comment-16522811

Eric Payne commented on YARN-4606:

{quote}At the same time, this patch is less "strict" in terms of updates (specifically on
when? ) compared to approaches discussed in our earlier patches.
The value for number of active apps per user used to be calculated every time through the
scheduler loop, which was a performance problem. In order to avoid this heavy calculation,
YARN-5889 created the {{UsersManager}}. Instead of doing the calculation every time through
the loop, YARN-5889 only recalculates these values when events occurs that could affect this
count like new application, app completes, new container request, completed container, etc.
In the latest POC patch, {{activeUsersWithOnlyPendingApps}} is part of this flow, so it will
always be updated whenever anything happens that could affect this value.
{quote}Also, based on our earlier discussions, We need to depend on activeUsers.get() only
in certain context and sum of activeUsers.get() and activeUsersWithOnlyPendingApps.get() in
some other places. But POC patch always depends on later value. I didn't understand this part.
I think you are referencing this comment from above:
{quote}My understanding is that user limit would use activeUsers and things like max AM limit
per user, we'd use activeUsers + activeUsersOfPendingApps
{{LeafQueue#activateApplications}} is the only thing that calls {{UsersManager#getNumActiveUsers}},
which it uses to calculate the user-specific AM limit, so it's the one that needs both activeusers
+ {{activeUsersWithOnlyPendingApps}}.
 {{UsersManager#computeUserLimit}} uses only activeUsers to calculate the headroom and user
limit, which is what we decided in the comment above. Is that your understanding of these

> CapacityScheduler: applications could get starved because computation of #activeUsers
considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.002.patch, YARN-4606.003.patch, YARN-4606.004.patch,
YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
> Currently, if all applications belong to same user in LeafQueue are pending (caused by
max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This
could lead to starvation of active applications, for example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs
to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new resources.
So computed user-limit-resource could be lower than expected.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message