Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 25757 invoked from network); 27 Oct 2008 13:11:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Oct 2008 13:11:09 -0000 Received: (qmail 96429 invoked by uid 500); 27 Oct 2008 13:11:09 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 96391 invoked by uid 500); 27 Oct 2008 13:11:09 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 96357 invoked by uid 99); 27 Oct 2008 13:11:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Oct 2008 06:11:09 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Oct 2008 13:10:04 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6C037234C247 for ; Mon, 27 Oct 2008 06:10:44 -0700 (PDT) Message-ID: <252818596.1225113044441.JavaMail.jira@brutus> Date: Mon, 27 Oct 2008 06:10:44 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4472) Should we move out the creation of setup/cleanup tasks from JobInProgress.initTasks()? In-Reply-To: <301694706.1224583426490.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642908#action_12642908 ] Owen O'Malley commented on HADOOP-4472: --------------------------------------- {quote} The IDs of setup and cleanup tasks can be -1 and -2. {quote} I don't think that is right. I think we should have a SetupTask and a CleanupTask that are both id = 0. The isMap boolean needs to be replaced with an enumeration. {SETUP, MAP, REDUCE, CLEANUP}. There should be tips associated with both setup and cleanup. {quote} JobTracker does not inform the listeners when the job is submitted, and it waits for the setup completion. {quote} The Scheduler should be in control of when the job is initialized. Therefore it must be notified when the job is submitted. Furthermore, it should be notified again when the setup task is finished and the rest of the job is runnable. {quote} JT can poll the waiting jobs to see if setup is complete for them {quote} I don't see why the JT should ever poll the state. It knows the state changed via the heartbeat. Do you mean the scheduler? That should be done via an event. > Should we move out the creation of setup/cleanup tasks from JobInProgress.initTasks()? > --------------------------------------------------------------------------------------- > > Key: HADOOP-4472 > URL: https://issues.apache.org/jira/browse/HADOOP-4472 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Vivek Ratan > Assignee: Amareshwari Sriramadasu > > JobInProgress.initTasks() creates TIPs for map and reduce tasks, and also the newly-introduced setup and cleanup tasks. initTasks() is called by the schedulers, as for reasons of memory optimizations, schedulers may choose to initialize M/R tasks at various moments (the Capacity Scheduler, for example, calls initTasks() just when it considers a job for running). One can say that Schedulers 'own' the initialization of M/R tasks in a job. Furthermore the JT 'owns' the setup and cleanup tasks (it schedules them, and Schedulers are unaware of these tasks). This causes a problematic dependency between the JT and a Scheduler. For example, the Capacity Scheduler calls initTasks() and immediately calls JobInProgress.obtainNewMapTask for a map task. This is a problem today, because we cannot run any map or reduce tasks before the setup task is run, which the Capacity Scheduler is not aware of. > Either all Schedulers are explicitly aware of setup/cleanup tasks and their dependencies with M/R tasks (in which case, Schedulers 'own' the creation and scheduling of all these tasks correctly), or the JT 'owns' the setup/cleanup tasks and Schedulers are completely unaware of them (in which case, the creation of setup/cleanup tasks must be moved out of initTasks into a separate method which is called by the JT). > I think the latter is the right way to go (unless we implement HADOOP-4421, in which case the former option may be viable as well). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.