hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
Date Thu, 31 May 2018 21:49:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497232#comment-16497232

Eric Payne commented on YARN-4606:

Thanks [~manirajv06@gmail.com] for the updated patch. Here are my comments so far:
- I am concerned that this implementation adds code that is specific to {{CapacityScheduler}}
inside of {{AppSchedulingInfo}}. I feel that this sets a precedent that makes it hard to maintain
a clean separation between abstract and specific scheduler code. Also, this only fixes the
problem for the {{CapacityScheduler}}. The previous fix in patch 001 was relying on metrics
and I realize that is risky, but it was a more generic fix. I would be interested to hear
thoughts from [~sunilg] and [~leftnoteasy].
- Only the {{CapacityScheduler}} has been changed to handle the new {{AppAMAttemptsFailedSchedulerEvent}}.
Should the other schedulers handle that as well? If they don't handle it, don't we risk them
getting unhandled event exceptions?
- In all places where new {{LOG.debug(...)}} statementes are added, please also enclose them
with {{if (LOG.isDebugEnabled())}}. This is for the sake of performance, so that the strings
are not built, passed to {{LOG.debug()}}, and then thrown away if log debugging is not enabled.

> CapacityScheduler: applications could get starved because computation of #activeUsers
considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.002.patch, YARN-4606.003.patch, YARN-4606.1.poc.patch,
YARN-4606.POC.2.patch, YARN-4606.POC.patch
> Currently, if all applications belong to same user in LeafQueue are pending (caused by
max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This
could lead to starvation of active applications, for example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs
to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new resources.
So computed user-limit-resource could be lower than expected.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message