hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4053) Schedulers need to know when a job has completed
Date Thu, 25 Sep 2008 12:17:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634460#action_12634460
] 

Steve Loughran commented on HADOOP-4053:
----------------------------------------

My needs aren't so much job scheduling as workflow integration. I'm just listening for job
lifecycle events so that I can match that lifecycle in remote code. As of yesterday I have
simple MR jobs being deployed against a dynamically instantiated set of hadoop processes,
using job.getStatus() to poll the state of the job and detecting success/failure when the
job declares itself completed. But already I can see that my tests get into trouble here as
they tear down the processes once the job is finished, and I see error messages in the test
log complaining that the trackers can't write their its task/job histories as the filesystem
has gone down. I need to 
 -consider moving from polling to notifiications to check job state (these would be RMI calls
or something similar, hence slow)
 -wait until the job and task trackers are completely done with processing the jobs before
pulling out the results and shutting down the cluster

so: no expectation that the base methods do anything, I'm just relaying events to other programs
that may or may not care

For the queue, I'd have a single queue of job events {{Queue<JobLifecycleEvent> events}}
and handle
{{{
  public void jobCompleted(JobInProgress jip) [
    events.add(new JobLifecycleEvent(JobLifecycleEventType.COMPLETED,jip)
  }
}}} then the queue thread would forward these off to whatever remote entity cared. 

Given that schedulers and other listeners behave differently, I'm now not so sure about a
base class. The javadocs for the listener need to make it clear that blocking isn't allowed
so that anyone providing a listener knows to do async work if needed.

> Schedulers need to know when a job has completed
> ------------------------------------------------
>
>                 Key: HADOOP-4053
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4053
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Vivek Ratan
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4053-v1.patch
>
>
> The JobInProgressListener interface is used by the framework to notify Schedulers of
when jobs are added, removed, or updated. Right now, there is no way for the Scheduler to
know that a job has completed. jobRemoved() is called when a job is retired, which can happen
many hours after a job is actually completed. jobUpdated() is called when a job's priority
is changed. We need to notify a listener when a job has completed (either successfully, or
has failed or been killed). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message