hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
Date Wed, 20 Jan 2016 07:56:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108183#comment-15108183

Wangda Tan commented on YARN-4606:

Updated description of the JIRA, originally it is found by [~karams] while doing fairness
ordering policy tests, pasting original test cases here just for reference:
Encountered while studying behaviour fairness with UserLimitPercent and UserLimitFactor during
following test:
Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 UserLimitFactor=32,
FairOrderingPolicy only. Encountered a application starving situation where 33 application
(190 apps completed out of 761 apps, queue can 345 containers) are running with total of 45
containers running, and that 12 extra only one app(the app was having around 18000 tasks)
, all other apps were having AM running only no other containers were given any apps. After
that app finished, there were 32 AMs that kept running without any containers for task being
GridMix was run with following settings:
gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, gridmix.client.submit.threads=5,
gridmix.submit.multiplier=0.0001, gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn,
mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, gridmix.sleep.max-map-time=5000,
gridmix.sleep.max-reduce-time=5000, gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver
With Users file containing 4 users for RoundRobinUserResolver

> CapacityScheduler: applications could get starved because computation of #activeUsers
considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Wangda Tan
> Currently, if all applications belong to same user in LeafQueue are pending (caused by
max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This
could lead to starvation of active applications, for example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs
to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new resources.
So computed user-limit-resource could be lower than expected.

This message was sent by Atlassian JIRA

View raw message