Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 49959 invoked from network); 8 Feb 2008 12:38:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Feb 2008 12:38:32 -0000 Received: (qmail 23583 invoked by uid 500); 8 Feb 2008 12:38:24 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 23561 invoked by uid 500); 8 Feb 2008 12:38:24 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 23552 invoked by uid 99); 8 Feb 2008 12:38:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Feb 2008 04:38:24 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Feb 2008 12:38:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 98F9A714066 for ; Fri, 8 Feb 2008 04:38:08 -0800 (PST) Message-ID: <4700667.1202474288507.JavaMail.jira@brutus> Date: Fri, 8 Feb 2008 04:38:08 -0800 (PST) From: "Amar Kamat (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2014) Job Tracker should prefer input-splits from overloaded racks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567020#action_12567020 ] Amar Kamat commented on HADOOP-2014: ------------------------------------ bq. In general, it will be useful to know the number of task trackers to which each split is local. Cant we just have a count of the maximum number of trackers that are having the split in the TIP itself? > Job Tracker should prefer input-splits from overloaded racks > ------------------------------------------------------------ > > Key: HADOOP-2014 > URL: https://issues.apache.org/jira/browse/HADOOP-2014 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > Assignee: Devaraj Das > > Currently, when the Job Tracker assigns a mapper task to a task tracker and there is no local split to the task tracker, the > job tracker will find the first runable task in the mast task list and assign the task to the task tracker. > The split for the task is not local to the task tracker, of course. However, the split may be local to other task trackers. > Assigning the that task, to that task tracker may decrease the potential number of mapper attempts with data locality. > The desired behavior in this situation is to choose a task whose split is not local to any task tracker. > Resort to the current behavior only if no such task is found. > In general, it will be useful to know the number of task trackers to which each split is local. > To assign a task to a task tracker, the job tracker should first try to pick a task that is local to the task tracker and that has minimal number of task trackers to which it is local. If no task is local to the task tracker, the job tracker should try to pick a task that has minimal number of task trackers to which it is local. > It is worthwhile to instrument the job tracker code to report the number of splits that are local to some task trackers. > That should be the maximum number of tasks with data locality. By comparing that number with the the actual number of > data local mappers launched, we can know the effectiveness of the job tracker scheduling. > When we introduce rack locality, we should apply the same principle. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.