Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 26847 invoked from network); 6 Oct 2008 16:08:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Oct 2008 16:08:38 -0000 Received: (qmail 62157 invoked by uid 500); 6 Oct 2008 16:08:34 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 62105 invoked by uid 500); 6 Oct 2008 16:08:34 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 62090 invoked by uid 99); 6 Oct 2008 16:08:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2008 09:08:34 -0700 X-ASF-Spam-Status: No, hits=-1999.9 required=10.0 tests=ALL_TRUSTED,DNS_FROM_SECURITYSAGE X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2008 16:07:39 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5E531234C21E for ; Mon, 6 Oct 2008 09:07:44 -0700 (PDT) Message-ID: <234094961.1223309264385.JavaMail.jira@brutus> Date: Mon, 6 Oct 2008 09:07:44 -0700 (PDT) From: "Vivek Ratan (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4053) Schedulers need to know when a job has completed In-Reply-To: <156825368.1220328524223.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4053?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D126= 37124#action_12637124 ]=20 Vivek Ratan commented on HADOOP-4053: ------------------------------------- I had a few questions/comments on _JobStatusChangeEvent_. - agree with Hemanth that the old JobStatus and new JobStatus should be pas= sed in explicitly. Otherwise there are hidden dependencies in the calling s= equence.=20 - It's not clear to me how we're naming the enum values for Events in _JobS= tatusChangeEvent_. What does RUN_STATE mean? Does it mean an event that cas= es a Job's run state to change? If so, do you mean the job was in a running= state and changed to something else or that its state changed to a running= state. I see the same enum value used for both. In CapacityScheduler.getTa= skFromQueue(), you add a RUN_STATE event when the job's state changes from = PREP to RUNNING. In JobTracker.finalizeJob(), you add a RUN_STATE event whe= n the job's state changes from RUNNING To something else. I think you need = to use separate events and name the events a little more consistently. Or e= lse, just rename the enum to STATE_CHANGE, which can be used for any state = change. This should be OK, given that you have an old and new job status an= d can figure out how the state changed. In general, the enum values should = be verbs: FINISH_TIME_CHANGED , rather than FINISH_TIME.=20 - I don't feel very comfortable with the fact that _JobStatusChangeEvent_ = can contain multiple Events? I see that the only use case is in the job rec= overy, when more than one attribute of a job status has changed. But, abstr= actly, having a single _JobStatusChangeEvent_ object handle multiple events= is not intuitive. Each event changes the job status. Since _JobStatusChang= eEvent_ only tracks a single pair of old and new JobStatus objects, what yo= u're really saying is that you can add events as long as each one independe= ntly changes the job status without affecting the other events. What preven= ts a user, for example, from adding two RUN_STATE events? Each one changes = the job status, but you can only keep track of two of them. I think concept= ually, a _JobStatusChangeEvent_ object should map to a single event change,= which in turn maps to a single pair of JobStatus objects. That's much clea= ner. During the normal running of the JobTracker, you only create a _JobSta= tusChangeEvent_ object for a single event. It's only in that one use case f= or recovering jobs where you apply multiple changes to a job status, and i = think it's ok to call updateJobListeners() multiple times. Otherwise, you m= uddle up the semantics of a _JobStatusChangeEvent_ object.=20 > Schedulers need to know when a job has completed > ------------------------------------------------ > > Key: HADOOP-4053 > URL: https://issues.apache.org/jira/browse/HADOOP-4053 > Project: Hadoop Core > Issue Type: Improvement > Affects Versions: 0.19.0 > Reporter: Vivek Ratan > Assignee: Amar Kamat > Priority: Blocker > Attachments: HADOOP-4053-v1.patch, HADOOP-4053-v2.patch, HADOOP-4= 053-v3.1.patch, HADOOP-4053-v3.2.patch > > > The JobInProgressListener interface is used by the framework to notify Sc= hedulers of when jobs are added, removed, or updated. Right now, there is n= o way for the Scheduler to know that a job has completed. jobRemoved() is c= alled when a job is retired, which can happen many hours after a job is act= ually completed. jobUpdated() is called when a job's priority is changed. W= e need to notify a listener when a job has completed (either successfully, = or has failed or been killed).=20 --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.