Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 2982 invoked from network); 31 Mar 2009 05:06:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2009 05:06:15 -0000 Received: (qmail 45670 invoked by uid 500); 31 Mar 2009 05:06:14 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 45595 invoked by uid 500); 31 Mar 2009 05:06:14 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 45585 invoked by uid 99); 31 Mar 2009 05:06:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2009 05:06:14 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2009 05:06:12 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1C62B234C003 for ; Mon, 30 Mar 2009 22:05:51 -0700 (PDT) Message-ID: <1521762973.1238475951101.JavaMail.jira@brutus> Date: Mon, 30 Mar 2009 22:05:51 -0700 (PDT) From: "Vinod K V (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4665) Add preemption to the fair scheduler In-Reply-To: <927835526.1226793164136.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693990#action_12693990 ] Vinod K V commented on HADOOP-4665: ----------------------------------- Sorry for the late reply, was on a long weekend. Your explanation w.r.t the min-shares of pools clarified my doubts. Thanks! I have discussed the overall approach of this patch with Hemant, Sreekanth and Rahul offline. We concluded on one slight improvement - the preemption patch of capacity-scheduler treats the preemption timeouts to be a kind of SLA for the pool/queue and so leaves a couple of heartbeats for slots to become free after it preempts a task. Can we do something like that here? - Essentially, the proposal is to preempt a task when a job's fairshare/minshare are not met within PREEMPTION_TIMEOUT- 2/3 heartbeats. Few other code comments I have: - Can we make PREEMPTION_INTERVAL configurable? - The check as to whether FairScheduler will do preemption(preemptionEnabled && !useFifo) is done deep inside - all the stats are calculated and then only preemption is skipped if not needed. Can we take this check, may be, to the beginning of preemptTasksIfNecessary() method or inside update() method itself. - The class FairSchedulerEventLog can just be package-private. So do all the methods inside - init, log, shutdown and isEnabled - they don't need to be public as of now. Again, sorry for the late reply. Appreciate your patience. > Add preemption to the fair scheduler > ------------------------------------ > > Key: HADOOP-4665 > URL: https://issues.apache.org/jira/browse/HADOOP-4665 > Project: Hadoop Core > Issue Type: New Feature > Components: contrib/fair-share > Reporter: Matei Zaharia > Assignee: Matei Zaharia > Fix For: 0.21.0 > > Attachments: fs-preemption-v0.patch, hadoop-4665-v1.patch, hadoop-4665-v1b.patch, hadoop-4665-v2.patch, hadoop-4665-v3.patch, hadoop-4665-v4.patch > > > Task preemption is necessary in a multi-user Hadoop cluster for two reasons: users might submit long-running tasks by mistake (e.g. an infinite loop in a map program), or tasks may be long due to having to process large amounts of data. The Fair Scheduler (HADOOP-3746) has a concept of guaranteed capacity for certain queues, as well as a goal of providing good performance for interactive jobs on average through fair sharing. Therefore, it will support preempting under two conditions: > 1) A job isn't getting its _guaranteed_ share of the cluster for at least T1 seconds. > 2) A job is getting significantly less than its _fair_ share for T2 seconds (e.g. less than half its share). > T1 will be chosen smaller than T2 (and will be configurable per queue) to meet guarantees quickly. T2 is meant as a last resort in case non-critical jobs in queues with no guaranteed capacity are being starved. > When deciding which tasks to kill to make room for the job, we will use the following heuristics: > - Look for tasks to kill only in jobs that have more than their fair share, ordering these by deficit (most overscheduled jobs first). > - For maps: kill tasks that have run for the least amount of time (limiting wasted time). > - For reduces: similar to maps, but give extra preference for reduces in the copy phase where there is not much map output per task (at Facebook, we have observed this to be the main time we need preemption - when a job has a long map phase and its reducers are mostly sitting idle and filling up slots). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.