Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 85874 invoked from network); 26 Jun 2006 22:00:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 Jun 2006 22:00:08 -0000 Received: (qmail 23945 invoked by uid 500); 26 Jun 2006 22:00:05 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 23860 invoked by uid 500); 26 Jun 2006 22:00:05 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 23840 invoked by uid 99); 26 Jun 2006 22:00:05 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jun 2006 15:00:05 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jun 2006 15:00:04 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E6C71410009 for ; Mon, 26 Jun 2006 21:58:30 +0000 (GMT) Message-ID: <11207244.1151359110942.JavaMail.jira@brutus> Date: Mon, 26 Jun 2006 21:58:30 +0000 (GMT+00:00) From: "Runping Qi (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency In-Reply-To: <4931825.1151126129857.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ] Runping Qi updated HADOOP-322: ------------------------------ Attachment: job_control_patch.txt My patch is attached. It has two classes: hadoop.jobs.Job and hadoop.jobs.JobControl. Job class encapsulates a MapReduce job and its dependency. It monitors the states of the depending jobs and updates the state of this job. A job starts in the WAITING state. If it does not have any deoending jobs, or all of the depending jobs are in SUCCESS state, then the job state will become READY. If any depending jobs fail, the job will fail too. When in READY state, the job can be submitted to Hadoop for execution, with the state changing into RUNNING state. From RUNNING state, the job can get into SUCCESS or FAILED state, depending the status of the jon execution. JobControl class encapsulates a set of MapReduce jobs and its dependency. It tracks the states of the jobs by placing them into different tables according to their states. This class provides APIs for the client app to add jobs to the group and to get the jobs in different states. When a job is added, an ID unique to the group is assigned to the job. This class has a thread that submits jobs when they become ready, monitors the states of the running jobs, and updates the states of jobs based on the state changes of their depending jobs states. The class provides APIs for suspending/resuming the thread,and for stopping the thread. A typical use scenarios is as follows: create a set of Map/Reduce job confs create a Job object per map/reduce job conf with proper depency create a JobControl object add the Job objects to the JobControl object create a control thread and run it: Thread theController = new Thread(theControl); theController.start(); while (!theControl.allFinished()) { System.out.println("Jobs in waiting state: " + theControl.getWaitingJobs().size()); System.out.println("Jobs in ready state: " + theControl.getReadyJobs().size()); System.out.println("Jobs in running state: " + theControl.getRunningJobs().size()); System.out.println("Jobs in success state: " + theControl.getSuccessfulJobs().size()); System.out.println("Jobs in failed state: " + theControl.getFailedJobs().size()); System.out.println("\n"); try { Thread.sleep(60000); } catch (Exception e) { } } theControl.stop(); > Need a job control utility to submit and monitor a group of jobs which have DAG dependency > ------------------------------------------------------------------------------------------ > > Key: HADOOP-322 > URL: http://issues.apache.org/jira/browse/HADOOP-322 > Project: Hadoop > Type: New Feature > Reporter: Runping Qi > Assignee: Runping Qi > Attachments: job_control_patch.txt > > In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that: > - accept jobs with dependency specification > - monitor job status > - submit jobs when they are ready > With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission. > I'll post my design skech for comments/suggestion. > Eventually, I'll submit a patch for the utility. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira