hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Vasudev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
Date Tue, 30 Sep 2014 21:06:33 GMT
Varun Vasudev created YARN-2628:
-----------------------------------

             Summary: Capacity scheduler with DominantResourceCalculator carries out reservation
even though slots are free
                 Key: YARN-2628
                 URL: https://issues.apache.org/jira/browse/YARN-2628
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler
    Affects Versions: 2.5.1
            Reporter: Varun Vasudev
            Assignee: Varun Vasudev


We've noticed that if you run the CapacityScheduler with the DominantResourceCalculator, sometimes
apps will end up with containers in a reserved state even though free slots are available.

The root cause seems to be this piece of code from CapacityScheduler.java -
{noformat}
    // Try to schedule more if there are no reservations to fulfill
    if (node.getReservedContainer() == null) {
      if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
          node.getAvailableResource(), minimumAllocation)) {
        if (LOG.isDebugEnabled()) {
          LOG.debug("Trying to schedule on node: " + node.getNodeName() +
              ", available: " + node.getAvailableResource());
        }
        root.assignContainers(clusterResource, node, false);
      }
    } else {
      LOG.info("Skipping scheduling since node " + node.getNodeID() + 
          " is reserved by application " + 
          node.getReservedContainer().getContainerId().getApplicationAttemptId()
          );
    }
{noformat}

The code is meant to check if a node has any slots available for containers . Since it uses
the greaterThanOrEqual function, we end up in situation where greaterThanOrEqual returns true,
even though we may not have enough CPU or memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message