hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3309) Capacity scheduler can wait a very long time for node locality
Date Mon, 09 Mar 2015 21:13:40 GMT
Nathan Roberts created YARN-3309:

             Summary: Capacity scheduler can wait a very long time for node locality
                 Key: YARN-3309
                 URL: https://issues.apache.org/jira/browse/YARN-3309
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler
    Affects Versions: 2.6.0
            Reporter: Nathan Roberts

The capacity scheduler will delay scheduling a container on a rack-local node in hopes that
a node-local opportunity will come along (YARN-80). It does this by counting the number of
missed scheduling opportunities the application has had. When the count reaches a certain
threshold, the app will accept the rack-local node. The documented recommendation is to set
this threshold to the #nodes in the cluster.

However, there are some early-out optimizations that can lead to this delay being a very long
Example in allocateContainersToNode():
   // Try to schedule more if there are no reservations to fulfill
    if (node.getReservedContainer() == null) {
      if (calculator.computeAvailableContainers(node.getAvailableResource(),
        minimumAllocation) > 0) {
        if (LOG.isDebugEnabled()) {
          LOG.debug("Trying to schedule on node: " + node.getNodeName() +
              ", available: " + node.getAvailableResource());
        root.assignContainers(clusterResource, node, false);

So, in a large cluster that is completely full (AvailableResource on each node is 0), SchedulingOpportunities
will only increase at the rate of container completion rate, not the heartbeat rate, which
I think was the original assumption of YARN-80. On a large cluster, this can lead to an hour+
of skipped scheduling opportunities meaning the fifo'ness of a queue is ignored for a very
long time.

Maybe there should be a time-based limit on this delay as well as a count of  missed-scheduling

This message was sent by Atlassian JIRA

View raw message