hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4261) Jobs failing in the init stage will never cleanup
Date Mon, 29 Sep 2008 08:58:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635317#action_12635317

Amar Kamat commented on HADOOP-4261:

Few comments w.r.t job-recovery
1) Upon restart, the task-completion-events/task-reports for the setup tasks should also match.
2) It would make more sense to call the job run-state as {{SETUP}} when {{logInited()}} is
invoked. While recovering, check if the SETUP state is reached before calling {{init()}}.
3) Check if {{JobInProgress.obtainSetupTask()}} can reuse {{JobInProgress.addRunningTaskToTIP()}}.
4) I think {{JobInProgress.canLaunchSetupTask()}} can also be written as
private synchronized boolean canLaunchSetupTask() {
    // check if the job is in PREP, initialized and not setup
    return status.getRunState() == JobStatus.PREP && tasksInited.get() &&
5) I dont see any code that deals with setup task in job-recovery i.e recovery-manager. Just
make sure that the effect of scheduling setup tasks before restart is same as the effect of
replaying it from history. I assume that when the JIP is given a task-attempt update, it figures
out if the task if setup or not. Ideally the way setup is launched from a recvory-manager
should mimic the way its invoked from the real(live) jobtracker.

> Jobs failing in the init stage will never cleanup
> -------------------------------------------------
>                 Key: HADOOP-4261
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4261
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.0
>         Attachments: patch-4261.txt
> Pre HADOOP-3150, if the job fails in the init stage, {{job.kill()}} was called. This
used to make sure that the job was cleaned up w.r.t 
> - staus set to KILLED/FAILED
> - job files from the system dir are deleted
> - closing of job history files
> - making jobtracker aware of this through {{jobTracker.finalizeJob()}}
> - cleaning up the data structures via {{JobInProgress.garbageCollect()}}
> Now if the job fails in the init stage, {{job.fail()}} is called which doesnt do the
cleanup. HADOOP-3150 introduces cleanup tasks which are launched once the job completes i.e
killed/failed/succeeded.  Jobtracker will never consider this job for scheduling as the job
will be in the {{PREP}} state forever.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message