Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 85079 invoked from network); 7 Sep 2009 18:07:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Sep 2009 18:07:20 -0000 Received: (qmail 12605 invoked by uid 500); 7 Sep 2009 18:07:20 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 12525 invoked by uid 500); 7 Sep 2009 18:07:19 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 12505 invoked by uid 99); 7 Sep 2009 18:07:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Sep 2009 18:07:19 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Sep 2009 18:07:17 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8E754234C044 for ; Mon, 7 Sep 2009 11:06:57 -0700 (PDT) Message-ID: <949915979.1252346817568.JavaMail.jira@brutus> Date: Mon, 7 Sep 2009 11:06:57 -0700 (PDT) From: "Hudson (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-936) Allow a load difference in fairshare scheduler In-Reply-To: <1169121722.1251497612719.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752230#action_12752230 ] Hudson commented on MAPREDUCE-936: ---------------------------------- Integrated in Hadoop-Mapreduce-trunk #75 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/]) > Allow a load difference in fairshare scheduler > ---------------------------------------------- > > Key: MAPREDUCE-936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-936 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/fair-share > Affects Versions: 0.20.1, 0.21.0, 0.22.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Fix For: 0.21.0 > > Attachments: MAPREDUCE-936.1.patch, MAPREDUCE-936.2.patch > > > The problem we are facing: It takes a long time for all tasks of a job to get scheduled on the cluster, even if the cluster is almost empty. > There are two reasons that together lead to this situation: > 1. The load factor makes sure each TT runs the same number of tasks. (This is the part that this patch tries to change). > 2. The scheduler tries to schedule map tasks locally (first node-local, then rack-local). There is a wait time (mapred.fairscheduler.localitywait.node and mapred.fairscheduler.localitywait.rack, both are around 10 sec in our conf), and accumulated wait time (JobInfo.localityWait). The accumulated wait time is reset to 0 whenever a non-local map task is scheduled. That means it takes N * wait_time to schedule N non-local map tasks. > Because of 1, a lot of TT will not be able to take more tasks, even if they have free slots. As a result, a lot of the map tasks cannot be scheduled locally. > Because of 2, it's really hard to schedule a non-local task. > As a result, sometimes we are seeing that it takes more than 2 minutes to schedule all the mappers of a job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.