hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1558) changes to OutputFormat to work on temporary directory to enable re-running crashed jobs (Issue: 1121)
Date Wed, 11 Jul 2007 07:42:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511679

Alejandro Abdelnur commented on HADOOP-1558:

Adding answer by email:
Suggestions make sense. I was looking at the Task class and it seems too Map/Reduce Task specific
so I'll need some help here. It is your intention to run the initialize/commit Tasks in the
JT box or it they should run in the slaves?

Had just another idea that would not require a separate task for initialize/commit, nor running
custom code in the JT.

A new interface:

public interface OutputHandler {
  public void initialize(JobConf conf) throws IOException;
  public void commit(JobConf conf) throws IOException;
  public Path getOutputDirPath(JobConf conf) throws IOException;

Provide 2 implementations of if:

1. FileOutputHandler that does the handling implemented by the patch.
2. NOPOutputhandler that does a no operation.

Add to the OutputFormat interface a method:

  public Class getOutputHandlerClass();

This method must returns the OutputHandler implementation the OutputFormat requires. It must
be a Hadoop provided implementation (for now one of the 2 above).

The JobConf, upon setting the OutpuFormat class will set an internal property with the declared
OutputHandler class.

The JobTracker and JobInProgress will use this property to instantiate and run the OutputHandler
initialize/commit methods.

Thus no custom code in the JT and no need for new Task classes.

> changes to OutputFormat to work on temporary directory to enable re-running crashed jobs
(Issue: 1121)
> ------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-1558
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1558
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>         Attachments: hadoop-1558-JUN1007-1934.txt
> Add  OutputFormat methods like:
> /** Called to initialize output for this job. */
> void initialize(JobConf job) throws IOException;
> /** Called to finalize output for this job. */
> void commit(JobConf job) throws IOException;
> In the base implemenation for FileSystem output, initialize() might then create a temporary
directory for the job, removing any that already exists, and commit could rename the temporary
output directory to the final name. 
> The existing checkOutputSpecs() would continue to throw an exception if the final output
already exists.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message