hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manikandan R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
Date Wed, 09 May 2018 13:57:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468872#comment-16468872
] 

Manikandan R commented on YARN-4606:
------------------------------------

{quote}1) Does this patch handles the case that one user has multiple pending apps? (Since
it doesn't store user to apps information).{quote}
Patch doesn't do anything about this case. As and when user submits an app, CS keeps increasing
activeUsersOfPendingApps count as part of accepting the application irrespective of whether
app has been submitted by same or different user.

{quote}Should we call this inside SchedulerApplicationAttempt#pullNewlyUpdatedContainers?

I think we should remove active user from pending apps once AM container get allocated{quote}

While trying to understand this through a real testing, encountered a situation where in {{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}}
returns empty {{updatedContainers}} always. I was just thinking whether can we call {{abstractUsersManager.decrNumActiveUsersOfPendingApps()}}
inside {{SchedulerApplicationAttempt#pullNewlyAllocatedContainers}} something like

{code}
if(! this.isWaitingForAMContainer() && 
            ! hasActiveUsersOfPendingAppsDecremented.get()) {
          this.queue.getAbstractUsersManager().decrNumActiveUsersOfPendingApps();
          hasActiveUsersOfPendingAppsDecremented.set(true);
        }
{code}

If we are planning to move calling {{decrNumActiveUsersOfPendingApps}} from {{AppSchedulingInfo#updatePendingResources}}
to {{SchedulerApplicationAttempt}}, then do we still need to am usage check against max am
limit? I don't think so. We faced the issue of accepting second app when we were calling {{decrNumActiveUsersOfPendingApps}}
inside {{abstractUsersManager.activateApplication()}} and that too from {{AppSchedulingInfo#updatePendingResources}}.
I dont think it is required anymore?

{quote}Does hasActiveUsersOfPendingAppsDecremented need to be atomic? What is the benefit?{quote}

Not required, I guess. Was trying to be too defensive :)

Will address names and comments related review points once we conclude the flow.

> CapacityScheduler: applications could get starved because computation of #activeUsers
considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch,
YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending (caused by
max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This
could lead to starvation of active applications, for example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs
to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new resources.
So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message