hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1558) changes to OutputFormat to work on temporary directory to enable re-running crashed jobs (Issue: 1121)
Date Mon, 20 Aug 2007 17:15:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521163

Doug Cutting commented on HADOOP-1558:

> this solution means the JT has to (potentially) execute client code

No, the simplest implementation might do that, but that wouldn't be acceptable.  We can probably
promote tasks in the task process.  Tasks might check with the jobtracker if they were the
winning invocation of the task, and promote themselves if they are.  We might even be able
to promote the job this way: when the last task is promoted, the job could be promoted in
that same jvm.  Abandoning failed tasks is trickier, since the task process may no longer
exist.  Job abandonment is similarly tricky.  In these cases I can see now way to avoid running
a special task.  Perhaps we can run a single cleanup task to abandon all failed tasks and
the job?

> changes to OutputFormat to work on temporary directory to enable re-running crashed jobs
(Issue: 1121)
> ------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-1558
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1558
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.15.0
>         Attachments: hadoop-1558-JUL2607-1600.txt
> Add  OutputFormat methods like:
> /** Called to initialize output for this job. */
> void initialize(JobConf job) throws IOException;
> /** Called to finalize output for this job. */
> void commit(JobConf job) throws IOException;
> In the base implemenation for FileSystem output, initialize() might then create a temporary
directory for the job, removing any that already exists, and commit could rename the temporary
output directory to the final name. 
> The existing checkOutputSpecs() would continue to throw an exception if the final output
already exists.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message