hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3150) Move task file promotion into the task
Date Sun, 06 Jul 2008 03:16:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610719#action_12610719

Alejandro Abdelnur commented on HADOOP-3150:

There are a few different topics being discussed in this issue:

# Changing from JT to Task the responsibility for committing the output of a task
# Making the committing of the output of a task generic, non HDFS specific
# Being able to create side OutputStreams (not RecordWriters) from a task

IMO this issue should only address the *first topic*. The gain of this is freeing the JT from
doing the task output commit, leaving to the JT just the coordination of it.

The *third topic*, as it has been suggested it could be address by Hadoop-3149, by adding
an static method {{getOutputStream(JobConf conf, String baseName)}}. This method would use
the filename namespacing introduced by Hadoop-3149 (previously Hadoop-3258) to create a unique
file under the job working output directory. Note that {{MultipleOutputs}} does not implement
{{OutputFormat}}, because of this, IMO, we are not overloading it with unrelated behavior;
{{MultipleOutputs}} just becomes a mean to create additional outputs, {{OutputFormat}}s or
{{OutputStream}}s in the context of the output of a task consistent with the handling of the
task output in the case of success completion and failure.

The *second topic* is a whole thing on it own and I think it should be left to its own Jira:

# It should make the commit of a task output independent of HDFS
# It should handle the commit of a task output atomically (at least against every single storage
the outputs go)
# It should not leave the commit to the {{OutputFormat}} as jobs can use their own output
formats, IMO it should be something like {{TaskOutputCommitter}} for each storage type that
is part of the Hadoop code (cannot be set by a job) and is run once per storage instance used
by the task (ideally in a transaction like style).

> Move task file promotion into the task
> --------------------------------------
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>         Attachments: 3150.patch
> We need to move the task file promotion from the JobTracker to the Task and move it down
into the output format.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message