Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 20446 invoked from network); 4 Feb 2009 18:06:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Feb 2009 18:06:26 -0000 Received: (qmail 4864 invoked by uid 500); 4 Feb 2009 18:06:23 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 4816 invoked by uid 500); 4 Feb 2009 18:06:22 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 4801 invoked by uid 99); 4 Feb 2009 18:06:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2009 10:06:22 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2009 18:06:20 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B720C234C4AE for ; Wed, 4 Feb 2009 10:05:59 -0800 (PST) Message-ID: <126342739.1233770759748.JavaMail.jira@brutus> Date: Wed, 4 Feb 2009 10:05:59 -0800 (PST) From: "Doug Cutting (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4927) Part files on the output filesystem are created irrespective of whether the corresponding task has anything to write there In-Reply-To: <1373445464.1229925644193.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670392#action_12670392 ] Doug Cutting commented on HADOOP-4927: -------------------------------------- > LazyOutputFormat.set(actualoutputformat.class) and > job.setOutputFormat(LazyOutputFormat.class) Right. That's the two-line penalty of a wrapper. If we built it into FileInputFormat then it would only take one line: FileOutputFormat.setLazyOutput(true); but it would then also only work for subclasses of FileOutputFormat, rather than any OutputFormat implementation. This is a tough call, since most, but not all, OutputFormats do subclass FileOutputFormat. I'm leaning towards the wrapper, since, while a bit more complex for users, it is a cleaner layering, making FileOutputFormat less of a kitchen-sink of features. > Part files on the output filesystem are created irrespective of whether the corresponding task has anything to write there > -------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-4927 > URL: https://issues.apache.org/jira/browse/HADOOP-4927 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Reporter: Devaraj Das > Assignee: Jothi Padmanabhan > Fix For: 0.21.0 > > Attachments: hadoop-4927-v1.patch, hadoop-4927-v2.patch, hadoop-4927.patch > > > When OutputFormat.getRecordWriter is invoked, a part file is created on the output filesystem. But the created RecordWriter is not used until the OutputCollector.collect call is made by the task (user's code). This results in empty part files even if the OutputCollector.collect is never invoked by the corresponding tasks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.