Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 43011 invoked from network); 23 Feb 2010 09:50:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Feb 2010 09:50:48 -0000 Received: (qmail 46751 invoked by uid 500); 23 Feb 2010 09:50:48 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 46679 invoked by uid 500); 23 Feb 2010 09:50:48 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 46669 invoked by uid 99); 23 Feb 2010 09:50:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Feb 2010 09:50:48 +0000 X-ASF-Spam-Status: No, hits=-1999.6 required=10.0 tests=ALL_TRUSTED,SUBJECT_FUZZY_TION X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Feb 2010 09:50:48 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0360129A0018 for ; Tue, 23 Feb 2010 01:50:28 -0800 (PST) Message-ID: <70038072.458451266918628012.JavaMail.jira@brutus.apache.org> Date: Tue, 23 Feb 2010 09:50:28 +0000 (UTC) From: "Adam Kramer (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-6592) Scheduler: Pause button desirable In-Reply-To: <617274241.454951266907048009.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837174#action_12837174 ] Adam Kramer commented on HADOOP-6592: ------------------------------------- bq. This meter would still be running even if the job is paused, so tasks or the entire job could be killed if its temp storage usage quota is reached. In general, this seems like a good practice--when temporary storage is filled, whatever task is using said storage should be killed in FIFO order. This means that paused jobs' tasks could fail, at which point that map task would be returned to the "pending" queue. bq. You need pre-emption, not pause. Pre-emption doesn't address a potential desire to pause a task that's currently running when there are no other tasks waiting. Say my map task requires 3000 mappers and my cluster has 1000. Then, when 1000 start running, I may pause in case other users submit jobs, in order to have at least some mappers free when they start their job. That is what I meant by "because I expect other tasks to arrive"--I meant other JOBS, submitted by other users, to whom I would like to be polite. Or if another user is being impolite, it would be good to "pause" that user's job until I can walk by his desk and ask him if there's a way that doesn't thrash the cluster...if it turns out there isn't, he could just keep his job paused until the workday is over (and thus the cluster is mostly free). A good stopgap in the meantime, that gets most of the desirable qualities of pause, would be the ability to set a ceiling on the number of map tasks my job could simultaneously use. > Scheduler: Pause button desirable > --------------------------------- > > Key: HADOOP-6592 > URL: https://issues.apache.org/jira/browse/HADOOP-6592 > Project: Hadoop Common > Issue Type: Wish > Reporter: Adam Kramer > Priority: Minor > > It would be lovely if, from the jobtracker page, I could click a button that's not "kill" or "fail" but ..."pause." > The pause button would stop a certain task from starting any more mappers or reducers. They would all wait in the "pending" stage until the job is "un-paused." Currently-running tasks would continue to run, and then complete, thus freeing the resources for other jobs. > This would help a lot for systems (esp. Hive) in which one or two jobs are hogging a lot of mappers or reducers. The ones they have would finish, and then other jobs could "catch up," and then they could be unpaused for a while. This would also allow for user-level throttling of their jobs in instances where they need a lot of resources but have the time to spare. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.