hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5540) scheduler spends too much time looking at empty priorities
Date Tue, 23 Aug 2016 20:13:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433534#comment-15433534

Arun Suresh commented on YARN-5540:

Thanks for the patch [~jlowe]

Minor nits:
# you can remove the {{TODO: Shouldn't we activate even if numContainers = 0}} since you are
now taking care of it.
# You do not really need to pass the schedulerKey around since you can extract it from the
request using {{SchedulerRequestKey::create(ResourceRequest)}} but since some of the existing
methods still pass it around, its not a must fix for me.

Thinking out loud here... shouldn't we probably merge the 2 data structures (the resourceRequestMap
ConcurrentHashMap and the schedulerKeys TreeSet) with a ConcurrentSkipListMap and return the
keySet() when getSchedulerKeys() is called.

> scheduler spends too much time looking at empty priorities
> ----------------------------------------------------------
>                 Key: YARN-5540
>                 URL: https://issues.apache.org/jira/browse/YARN-5540
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, fairscheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: Jason Lowe
>         Attachments: YARN-5540.001.patch
> We're starting to see the capacity scheduler run out of scheduling horsepower when running
500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many more priorities
(sometimes in the hundreds) than typical MR applications and therefore the loop in the scheduler
which examines every priority within every running application, starts to be a hotspot. The
priorities appear to stay around forever, even when there is no remaining resource request
at that priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x00007fc2d453e800 nid=0x22f3
runnable [0x00007fc2a8be2000]
>    java.lang.Thread.State: RUNNABLE
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
>         - eliminated <0x00000005e73e5dc0> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
>         - locked <0x00000005e73e5dc0> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
>         - locked <0x00000003006fcf60> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
>         - locked <0x00000003001b22f8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
>         - locked <0x00000003001b22f8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
>         - locked <0x0000000300041e40> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message