Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 858BCD2FE for ; Mon, 10 Sep 2012 06:09:11 +0000 (UTC) Received: (qmail 43905 invoked by uid 500); 10 Sep 2012 06:09:10 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 43874 invoked by uid 500); 10 Sep 2012 06:09:10 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 43863 invoked by uid 99); 10 Sep 2012 06:09:10 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Sep 2012 06:09:10 +0000 Date: Mon, 10 Sep 2012 17:09:10 +1100 (NCT) From: "Karthik Kambatla (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: <684534808.56611.1347257350401.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (YARN-80) Support delay scheduling for node locality in MR2's capacity scheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451777#comment-13451777 ] Karthik Kambatla commented on YARN-80: -------------------------------------- bq. Perhaps the better way to do this is to have the AM be responsible for making the requests at different times. So for example on the first heartbeat after a container is needed only the node local request is made. If it does not get it after a specific timeout (1 heartbeat by default) then a rack local request is added, and finally the global request is added after another timeout. +1. Should we create a JIRA for this to make sure we don't miss out? > Support delay scheduling for node locality in MR2's capacity scheduler > ---------------------------------------------------------------------- > > Key: YARN-80 > URL: https://issues.apache.org/jira/browse/YARN-80 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler > Reporter: Todd Lipcon > Assignee: Arun C Murthy > Fix For: 2.0.2-alpha > > Attachments: YARN-80.patch, YARN-80.patch > > > The capacity scheduler in MR2 doesn't support delay scheduling for achieving node-level locality. So, jobs exhibit poor data locality even if they have good rack locality. Especially on clusters where disk throughput is much better than network capacity, this hurts overall job performance. We should optionally support node-level delay scheduling heuristics similar to what the fair scheduler implements in MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira