hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1558) changes to OutputFormat to work on temporary directory to enable re-running crashed jobs (Issue: 1121)
Date Tue, 10 Jul 2007 21:59:04 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Doug Cutting updated HADOOP-1558:

    Fix Version/s:     (was: 0.14.0)
           Status: Open  (was: Patch Available)

This is a good feature, but it's going to be more complicated to implement.  We only instantiate
user classes in task and client jvms, never in jobtracker or tasktracker jvms.  So initialize()
and commit() need to be run as tasks: InitializeTask and CommitTask.  Adding new task classes
should be easy in principle, but it might not be in practice.  Also, getUncommittedOutputDirectory()
is specific to file-based output formats and so does not belong in the OutputFormat interface,
but rather on a base class for file-based outputs.  We should probably rename OutputFormatBase
to be FileOutputFormat, just as we renamed InputFormatBase to be FileInputFormat.

> changes to OutputFormat to work on temporary directory to enable re-running crashed jobs
(Issue: 1121)
> ------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-1558
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1558
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>         Attachments: hadoop-1558-JUN1007-1934.txt
> Add  OutputFormat methods like:
> /** Called to initialize output for this job. */
> void initialize(JobConf job) throws IOException;
> /** Called to finalize output for this job. */
> void commit(JobConf job) throws IOException;
> In the base implemenation for FileSystem output, initialize() might then create a temporary
directory for the job, removing any that already exists, and commit could rename the temporary
output directory to the final name. 
> The existing checkOutputSpecs() would continue to throw an exception if the final output
already exists.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message