hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3150) Move task file promotion into the task
Date Fri, 18 Jul 2008 10:20:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614686#action_12614686
] 

Alejandro Abdelnur commented on HADOOP-3150:
--------------------------------------------

A couple of comments on the latest patch (JUL/17):

*1 {{cleanupTask}} method*

The {{OutputFormat.cleanupTask(TaskAttemptContext taskContext, boolean promote)}} method name
and what the method is supposed to do do not match. It is not intuitive. I would suggest having
2 methods {{commitTask(TaskAttemptContext ctx)}} and {{discardTask(TaskAttemptContext ctx)}}
instead, then it is clear what is the intention when looking at the methods and their usage
(a boolean to do exactly the opposite is confusing).

*2 New methods in the {{OutputFormat}}*

Instead adding the following 4 to the {{OutputFormat}}:

{code:java}
+  public abstract void setupJob(JobContext context) throws IOException;
+  public abstract void cleanupJob(JobContext context) throws IOException;
+  public abstract void setupTask(TaskAttemptContext taskContext)  throws IOException;
+  public abstract void cleanupTask(TaskAttemptContext taskContext, boolean promote) throws
IOException;
{code}

I would put them in a separate abstract class {{OutputCommitter}} and add to the {{OutputFormat}}
a single abstract method {{OutputCommitter getOutputCommitter()}}. 

The {{FileOutputFormat}} would implement the {{getOutputCommitter()}} method returning a {{FileOutputCommitter}}.

The {{Task.done(Umbilical)}} when taking care of the side files would instantiate a {{FileOutputCommitter}}
(taking the class from a config property) and do the commit for side files.

The pros with approach are:

* It gets rid of the static commit method if {{FileOutputFormat}} for the special case of
side files.
* It makes commit of side files pluggable as well.
* It reduces the methods in the {{OutputFormat}} to what are relevant to output, handling
the commit as a separate concern.
* It will make less prone to errors for developers creating their own {{OutputFormat}} implementations
as it is more clear the separation of concerns.
* The code in {{Task}} will be simpler.




> Move task file promotion into the task
> --------------------------------------
>
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>
>         Attachments: 3150.patch, patch-3150.txt, patch-3150.txt
>
>
> We need to move the task file promotion from the JobTracker to the Task and move it down
into the output format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message