hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3150) Move task file promotion into the task
Date Fri, 18 Jul 2008 10:20:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614686#action_12614686

Alejandro Abdelnur commented on HADOOP-3150:

A couple of comments on the latest patch (JUL/17):

*1 {{cleanupTask}} method*

The {{OutputFormat.cleanupTask(TaskAttemptContext taskContext, boolean promote)}} method name
and what the method is supposed to do do not match. It is not intuitive. I would suggest having
2 methods {{commitTask(TaskAttemptContext ctx)}} and {{discardTask(TaskAttemptContext ctx)}}
instead, then it is clear what is the intention when looking at the methods and their usage
(a boolean to do exactly the opposite is confusing).

*2 New methods in the {{OutputFormat}}*

Instead adding the following 4 to the {{OutputFormat}}:

+  public abstract void setupJob(JobContext context) throws IOException;
+  public abstract void cleanupJob(JobContext context) throws IOException;
+  public abstract void setupTask(TaskAttemptContext taskContext)  throws IOException;
+  public abstract void cleanupTask(TaskAttemptContext taskContext, boolean promote) throws

I would put them in a separate abstract class {{OutputCommitter}} and add to the {{OutputFormat}}
a single abstract method {{OutputCommitter getOutputCommitter()}}. 

The {{FileOutputFormat}} would implement the {{getOutputCommitter()}} method returning a {{FileOutputCommitter}}.

The {{Task.done(Umbilical)}} when taking care of the side files would instantiate a {{FileOutputCommitter}}
(taking the class from a config property) and do the commit for side files.

The pros with approach are:

* It gets rid of the static commit method if {{FileOutputFormat}} for the special case of
side files.
* It makes commit of side files pluggable as well.
* It reduces the methods in the {{OutputFormat}} to what are relevant to output, handling
the commit as a separate concern.
* It will make less prone to errors for developers creating their own {{OutputFormat}} implementations
as it is more clear the separation of concerns.
* The code in {{Task}} will be simpler.

> Move task file promotion into the task
> --------------------------------------
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>         Attachments: 3150.patch, patch-3150.txt, patch-3150.txt
> We need to move the task file promotion from the JobTracker to the Task and move it down
into the output format.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message