Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 34396 invoked from network); 30 Nov 2010 20:56:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Nov 2010 20:56:34 -0000 Received: (qmail 29616 invoked by uid 500); 30 Nov 2010 20:56:34 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 29592 invoked by uid 500); 30 Nov 2010 20:56:34 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 29584 invoked by uid 99); 30 Nov 2010 20:56:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 20:56:34 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 20:56:33 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oAUKuDAT021890 for ; Tue, 30 Nov 2010 20:56:13 GMT Message-ID: <31913254.35231291150573184.JavaMail.jira@thor> Date: Tue, 30 Nov 2010 15:56:13 -0500 (EST) From: "Matei Zaharia (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-2205) FairScheduler should only preempt tasks for pools/jobs that are up next for scheduling In-Reply-To: <28828519.16751291083612055.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965432#action_12965432 ] Matei Zaharia commented on MAPREDUCE-2205: ------------------------------------------ Just to add a bit, one other complication with the "walk down the ordering" approach is that the ordering may change over time as jobs get slots and as their old tasks finish. It's hard to look at the ordering now and know exactly which jobs will launch tasks before the next preemption interval. In contrast, with the way I proposed, you don't need any such estimation. We just directly fix the issue that a timed-out job may not be at the head of the ordering. Does this make sense? > FairScheduler should only preempt tasks for pools/jobs that are up next for scheduling > -------------------------------------------------------------------------------------- > > Key: MAPREDUCE-2205 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2205 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/fair-share > Reporter: Joydeep Sen Sarma > > We have hit a problem with the preemption implementation in the FairScheduler where the following happens: > # job X runs short of fair share or min share and requests/causes N tasks to be preempted > # when slots are then scheduled - tasks from some other job are actually scheduled > # after preemption_interval has passed, job X finds it's still underscheduled and requests preemption. goto 1. > This has caused widespread preemption of tasks and the cluster going from high utilization to low utilization in a few minutes. > Some of the problems are specific to our internal version of hadoop (still 0.20 and doesn't have the hierarchical FairScheduler) - but i think the issue here is generic (just took a look at the trunk assignTasks and tasksToPreempt routines). The basic problem seems to be that the logic of assignTasks+FairShareComparator is not consistent with the logic in tasksToPreempt(). The latter can choose to preempt tasks on behalf of jobs that may not be first up for scheduling based on the FairComparator. Understanding whether these two separate pieces of logic are consistent and keeping it that way is difficult. > It seems that a much safer preemption implementation is to walk the jobs in the order they would be scheduled on the next heartbeat - and only preempt for jobs that are at the head of this sorted queue. In MAPREDUCE-2048 - we have already introduced a pre-sorted list of jobs ordered by current scheduling priority. It seems much easier to preempt only jobs at the head of this sorted list. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.