Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 96539 invoked from network); 6 Oct 2008 11:40:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Oct 2008 11:40:37 -0000 Received: (qmail 76462 invoked by uid 500); 6 Oct 2008 11:40:34 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 76413 invoked by uid 500); 6 Oct 2008 11:40:34 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 76401 invoked by uid 99); 6 Oct 2008 11:40:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2008 04:40:34 -0700 X-ASF-Spam-Status: No, hits=-1999.9 required=10.0 tests=ALL_TRUSTED,DNS_FROM_SECURITYSAGE X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2008 11:39:39 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id ACF8E234C21E for ; Mon, 6 Oct 2008 04:39:44 -0700 (PDT) Message-ID: <560584485.1223293184705.JavaMail.jira@brutus> Date: Mon, 6 Oct 2008 04:39:44 -0700 (PDT) From: "Hemanth Yamijala (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4053) Schedulers need to know when a job has completed In-Reply-To: <156825368.1220328524223.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637087#action_12637087 ] Hemanth Yamijala commented on HADOOP-4053: ------------------------------------------ Some comments: JobChangeEvent: - Should be package-private. - As a convention, should we extend {{java.util.Event}} ? Source can be the JIP. JobStatusChangeEvent: - Should be package-private. - Javadoc seems to indicate an event is raised for progress and finish time changes as well, when it is not. This should be removed. - The enum {{Events}} is actually specifying type of events - so should it read {{EventType}} ? - IMO, doing the {{clone}} inside the methods of this class, while convenient, can lead to erroneous usage. For e.g. if someone changes the JobStatus before creating the event object, the old status is lost. And this cannot be captured anywhere else except in documentation, which could be missed. I think the usage will become more clear if {{JobStatusChangeEvent}} is written like this: {code} class JobStatusChangeEvent extends JobChangeEvent { public JobStatusChangeEvent(JobInProgress source, List events, JobStatus oldStatus, JobStatus newStatus) { super(source); this.events = events this.oldStatus = oldStatus; this.newStatus = newStatus; } // ... other APIs } {code} and leave the responsibility of taking a snapshot to the callers. This means the callers write a bit more code, but it is less error prone. Also, I checked some of the event classes in the java API, and they seem to have a similar structure. For e.g. look at {{javax.naming.event.NamingEvent}} or {{java.util.prefs.NodeChangeEvent}} Other JobInProgressListener sub-classes: - So that future code is easier to write, we need to check for the type of event being {{JobStatusChangeEvent}} and the events enum is of the type we are interested in (time changes and priority changes) to default to the current implementation. - I think it is OK to handle run state changes also in this JIRA, and behave similarly to {{jobRemoved}} atleast for the {{JobQueueJobInProgressListener}} and {{EagerTaskInitializationListener}}. JobQueuesManager: - It would be nice to add a comment in {{jobRemoved}} mentioning we already handle removals when the run state changes. - It may be safer to check for {{runState}} to be RUNNING before calling {{promoteJob}}. And maybe {{promoteJob}} should be called {{makeJobRunning}} or something. JobTracker: - In {{RecoveryManager}}, previously {{jobUpdated}} was handled in the scheduler only once. Now, since we add separate events for PRIORITY and START_TIME, the same code would be executed twice. I think this should be avoided, maybe in the implementation of {{JobQueuesManager.jobUpdated}}. Tests: - There are some spurious System.out.printlns, which can change to LOG.debug - We can use {{scheduler.getJobs()}} and check the job is not present in the list. This would make sure that the UI also will reflect the change. - Can we add a test case for job priority change and the {{promoteJob}} code path as well ? > Schedulers need to know when a job has completed > ------------------------------------------------ > > Key: HADOOP-4053 > URL: https://issues.apache.org/jira/browse/HADOOP-4053 > Project: Hadoop Core > Issue Type: Improvement > Affects Versions: 0.19.0 > Reporter: Vivek Ratan > Assignee: Amar Kamat > Priority: Blocker > Attachments: HADOOP-4053-v1.patch, HADOOP-4053-v2.patch, HADOOP-4053-v3.1.patch, HADOOP-4053-v3.2.patch > > > The JobInProgressListener interface is used by the framework to notify Schedulers of when jobs are added, removed, or updated. Right now, there is no way for the Scheduler to know that a job has completed. jobRemoved() is called when a job is retired, which can happen many hours after a job is actually completed. jobUpdated() is called when a job's priority is changed. We need to notify a listener when a job has completed (either successfully, or has failed or been killed). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.