Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 88479 invoked from network); 6 Jul 2008 03:19:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Jul 2008 03:19:54 -0000 Received: (qmail 29042 invoked by uid 500); 6 Jul 2008 03:19:53 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 29016 invoked by uid 500); 6 Jul 2008 03:19:53 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 29005 invoked by uid 99); 6 Jul 2008 03:19:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Jul 2008 20:19:53 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Jul 2008 03:19:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 000EB29A0014 for ; Sat, 5 Jul 2008 20:19:31 -0700 (PDT) Message-ID: <933919151.1215314371991.JavaMail.jira@brutus> Date: Sat, 5 Jul 2008 20:19:31 -0700 (PDT) From: "Alejandro Abdelnur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Issue Comment Edited: (HADOOP-3150) Move task file promotion into the task In-Reply-To: <1005772078.1207083747584.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610719#action_12610719 ] tucu00 edited comment on HADOOP-3150 at 7/5/08 8:18 PM: -------------------------------------------------------------------- There are a few different topics being discussed in this issue: # Changing from JT to Task the responsibility for committing the output of a task # Making the committing of the output of a task generic, non HDFS specific # Being able to create side {{OutputStream}} s (not {{RecordWriters}} ) from a task IMO this issue should only address the *first topic*. The gain of this is freeing the JT from doing the task output commit, leaving to the JT just the coordination of it. The *third topic*, as it has been suggested it could be address by Hadoop-3149, by adding an static method {{getOutputStream(JobConf conf, String baseName)}}. This method would use the filename namespacing introduced by Hadoop-3149 (previously Hadoop-3258) to create a unique file under the job working output directory. Note that {{MultipleOutputs}} does not implement {{OutputFormat}}, because of this, IMO, we are not overloading it with unrelated behavior; {{MultipleOutputs}} just becomes a mean to create additional outputs, {{OutputFormat}} s or {{OutputStream}} s in the context of the output of a task consistent with the handling of the task output in the case of success completion and failure. The *second topic* is a whole thing on it own and I think it should be left to its own Jira: # It should make the commit of a task output independent of HDFS # It should handle the commit of a task output atomically (at least against every single storage the outputs go) # It should not leave the commit to the {{OutputFormat}} as jobs can use their own output formats, IMO it should be something like {{TaskOutputCommitter}} for each storage type that is part of the Hadoop code (cannot be set by a job) and is run once per storage instance used by the task (ideally in a transaction like style). was (Author: tucu00): There are a few different topics being discussed in this issue: # Changing from JT to Task the responsibility for committing the output of a task # Making the committing of the output of a task generic, non HDFS specific # Being able to create side OutputStreams (not RecordWriters) from a task IMO this issue should only address the *first topic*. The gain of this is freeing the JT from doing the task output commit, leaving to the JT just the coordination of it. The *third topic*, as it has been suggested it could be address by Hadoop-3149, by adding an static method {{getOutputStream(JobConf conf, String baseName)}}. This method would use the filename namespacing introduced by Hadoop-3149 (previously Hadoop-3258) to create a unique file under the job working output directory. Note that {{MultipleOutputs}} does not implement {{OutputFormat}}, because of this, IMO, we are not overloading it with unrelated behavior; {{MultipleOutputs}} just becomes a mean to create additional outputs, {{OutputFormat}}s or {{OutputStream}}s in the context of the output of a task consistent with the handling of the task output in the case of success completion and failure. The *second topic* is a whole thing on it own and I think it should be left to its own Jira: # It should make the commit of a task output independent of HDFS # It should handle the commit of a task output atomically (at least against every single storage the outputs go) # It should not leave the commit to the {{OutputFormat}} as jobs can use their own output formats, IMO it should be something like {{TaskOutputCommitter}} for each storage type that is part of the Hadoop code (cannot be set by a job) and is run once per storage instance used by the task (ideally in a transaction like style). > Move task file promotion into the task > -------------------------------------- > > Key: HADOOP-3150 > URL: https://issues.apache.org/jira/browse/HADOOP-3150 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Owen O'Malley > Assignee: Amareshwari Sriramadasu > Fix For: 0.19.0 > > Attachments: 3150.patch > > > We need to move the task file promotion from the JobTracker to the Task and move it down into the output format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.