hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manikandan R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
Date Thu, 17 May 2018 13:00:03 GMT

    [ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479016#comment-16479016
] 

Manikandan R commented on YARN-4606:
------------------------------------

Attaching .002 patch for review.

{quote}Does this patch handles the case that one user has multiple pending apps? (Since it
doesn't store user to apps information).{quote}
Started handling this case.
{quote}Should we call this inside {{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}}? 
I think we should remove active user from pending apps once AM container get allocated{quote}
Yes, inside {{SchedulerApplicationAttempt#pullNewlyAllocatedContainers}} and that too after
updating containers with tokens as {{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}}
does takes care of INCREASE, DECREASE, PROMOTE, DEMOTE cases etc not the regular cases.
{quote}Instead of using metrics, it might be better to use SchedulerApplicationAttempt#getAppAttemptResourceUsage
instead.{quote}
Not required, I guess as explained in previous comment.
{quote}I am doing an in-depth review, but I would like to address a few things first regarding
method names and comments. I feel that it is important to be accurate in these areas in order
to eliminate confusion for those maintaining this code.{quote}
Taken care of all related comments.

In addition to above changes, We have taken care of app being in ACCEPTED state with all AM
attempts has been failed because of some reasons. We would like to decrement the count even
in this case and handles this case via signalling scheduler using new event type. 

Also, I am assuming app MOVE from one queue to another doesn't require changes as it happen
only when app is running?

Thanks [~sunilg] for providing suggestions in some of the above steps.

> CapacityScheduler: applications could get starved because computation of #activeUsers
considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch,
YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending (caused by
max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This
could lead to starvation of active applications, for example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs
to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new resources.
So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message