Return-Path: X-Original-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E7599A24 for ; Mon, 11 Mar 2013 23:21:14 +0000 (UTC) Received: (qmail 66867 invoked by uid 500); 11 Mar 2013 23:21:14 -0000 Delivered-To: apmail-hadoop-yarn-dev-archive@hadoop.apache.org Received: (qmail 66786 invoked by uid 500); 11 Mar 2013 23:21:14 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-dev@hadoop.apache.org Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 66773 invoked by uid 99); 11 Mar 2013 23:21:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Mar 2013 23:21:14 +0000 Date: Mon, 11 Mar 2013 23:21:14 +0000 (UTC) From: "Roger Hoover (JIRA)" To: yarn-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (YARN-466) Slave hostname mismatches in ResourceManager/Scheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Roger Hoover created YARN-466: --------------------------------- Summary: Slave hostname mismatches in ResourceManager/Scheduler Key: YARN-466 URL: https://issues.apache.org/jira/browse/YARN-466 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Roger Hoover The problem is that the ResourceManager learns the hostname of a slave node when the NodeManager registers itself and it seems the node manager is getting the hostname by asking the OS. When a job is submitted, I think the ApplicationMaster learns the hostname by doing a reverse DNS lookup based on the slaves file. Therefore, the ApplicationMaster submits requests for containers using the fully qualified domain name (node1.foo.com) but the scheduler uses the OS hostname (node1) when checking to see if any requests are node-local. The result is that node-local requests are never found using this method of searching for node-local requests: ResourceRequest request = application.getResourceRequest(priority, node.getHostName()); I think it's unfriendly to ask users to make sure they configure hostnames to match fully qualified domain names. There should be a way for the ApplicationMaster and NodeManager to agree on the hostname. Steps to Reproduce: 1) Configure the OS hostname on slaves to differ from the fully qualified domain name. For example, if the FQDN for the slave is "node1.foo.com", set the hostname on the node to be just "node1". 2) On submitting a job, observe that the AM submits resource requests using the FQDN (e.g. "node1.foo.com"). You can add logging to the allocate() method of whatever scheduler you're using for (ResourceRequest req: ask) { LOG.debug(String.format("Request %s for %d containers on %s", req, req.getNumContainers(), req.getHostName())); } 3) Observe that when the scheduler checks for node locality (in the handle() method) using the FiCaSchedulerNode.getHostName(), the hostname is uses is the one set in the host OS (e.g. "node1"). NOTE: if you're using FifoScheduler, this bug needs to be fixed first (https://issues.apache.org/jira/browse/YARN-412). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira