hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3150) Move task file promotion into the task
Date Mon, 21 Jul 2008 06:17:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615154#action_12615154
] 

Owen O'Malley commented on HADOOP-3150:
---------------------------------------

I think you may be right that we want to have an OutputCommitter, but it should *not* be determined
by the OutputFormat. Rather it should be configured independently. In particular, we can default
it to "FileOutputCommitter" with roughly the current semantics. My concern with having the
OutputFormat create the OutputCommitter is it makes the api more complex and the application
may want to write side files with a non file output format.

I'd propose something like:
{code}
public abstract class OutputCommitter {
  public abstract void setupJob(JobContext context) throws IOException;
  public abstract void commitJob(JobContext context) throws IOException;
  public abstract void abortJob(JobContext context) throws IOException;
  public abstract void setupTask(TaskAttemptContext context) throws IOException;
  public abstract boolean needsTaskCommit(TaskAttemptContext context) throws IOException;
  public abstract void commitTask(TaskAttemptContext context) throws IOException;
  public abstract void abortTask(TaskAttemptContext context) throws IOException;
}

public class FileOutputCommitter extends OutputCommitter {
  public Path getWorkPath(Path basePath) throws IOException;
}

public class JobConf {
  public OutputCommitter getOutputCommitter();
}
{code}

We need the test for needing commit to optimize the very typical case where there is nothing
to commit and thus no point to a round trip from the JobTracker. The FileOutputFormat would
check if the OutputCommitter is a FileOutputCommitter and if so, it would use the getWorkPath
from it.

Thoughts?

> Move task file promotion into the task
> --------------------------------------
>
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>
>         Attachments: 3150.patch, patch-3150.txt, patch-3150.txt
>
>
> We need to move the task file promotion from the JobTracker to the Task and move it down
into the output format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message