hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roger Hoover (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-466) Slave hostname mismatches in ResourceManager/Scheduler
Date Mon, 11 Mar 2013 23:21:14 GMT
Roger Hoover created YARN-466:

             Summary: Slave hostname mismatches in ResourceManager/Scheduler
                 Key: YARN-466
                 URL: https://issues.apache.org/jira/browse/YARN-466
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager, scheduler
            Reporter: Roger Hoover

The problem is that the ResourceManager learns the hostname of a slave node when the NodeManager
registers itself and it seems the node manager is getting the hostname by asking the OS. 
When a job is submitted, I think the ApplicationMaster learns the hostname by doing a reverse
DNS lookup based on the slaves file.

Therefore, the ApplicationMaster submits requests for containers using the fully qualified
domain name (node1.foo.com) but the scheduler uses the OS hostname (node1) when checking to
see if any requests are node-local.  The result is that node-local requests are never found
using this method of searching for node-local requests:

ResourceRequest request = application.getResourceRequest(priority, node.getHostName());

I think it's unfriendly to ask users to make sure they configure hostnames to match fully
qualified domain names. There should be a way for the ApplicationMaster and NodeManager to
agree on the hostname.

Steps to Reproduce:
1) Configure the OS hostname on slaves to differ from the fully qualified domain name.  For
example, if the FQDN for the slave is "node1.foo.com", set the hostname on the node to be
just "node1".
2) On submitting a job, observe that the AM submits resource requests using the FQDN (e.g.
"node1.foo.com").  You can add logging to the allocate() method of whatever scheduler you're

for (ResourceRequest req: ask) {
      LOG.debug(String.format("Request %s for %d containers on %s", req, req.getNumContainers(),
3) Observe that when the scheduler checks for node locality (in the handle() method) using
the FiCaSchedulerNode.getHostName(), the hostname is uses is the one set in the host OS (e.g.
"node1").  NOTE: if you're using FifoScheduler, this bug needs to be fixed first (https://issues.apache.org/jira/browse/YARN-412).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message