Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 42362 invoked from network); 3 Feb 2009 17:42:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Feb 2009 17:42:29 -0000 Received: (qmail 41781 invoked by uid 500); 3 Feb 2009 17:42:22 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 41742 invoked by uid 500); 3 Feb 2009 17:42:22 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 41731 invoked by uid 99); 3 Feb 2009 17:42:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2009 09:42:22 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2009 17:42:20 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B6C38234C4B4 for ; Tue, 3 Feb 2009 09:41:59 -0800 (PST) Message-ID: <1898505784.1233682919747.JavaMail.jira@brutus> Date: Tue, 3 Feb 2009 09:41:59 -0800 (PST) From: "Doug Cutting (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4927) Part files on the output filesystem are created irrespective of whether the corresponding task has anything to write there In-Reply-To: <1373445464.1229925644193.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670028#action_12670028 ] Doug Cutting commented on HADOOP-4927: -------------------------------------- > Unless there's a non-FileOutputFormat use case [ ... ] I see Chris's point and agree. Unless there's a strong reason to put features in the kernel we should prefer to put them in library code, keeping the kernel minimal. Are there non-FileInputFormats that need this feature? A wrapper implementation is a bit harder to use, since folks would need to both set the job's outputformat to the wrapper, and set the wrapper's parameter to the real output format: two changes instead of just setting a single parameter, although it is more generic. We could perhaps implement both: a flag for FileOutputFormat and a wrapper OutputFormat for folks who've not subclassed FileOutputFormat? > Part files on the output filesystem are created irrespective of whether the corresponding task has anything to write there > -------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-4927 > URL: https://issues.apache.org/jira/browse/HADOOP-4927 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Reporter: Devaraj Das > Assignee: Jothi Padmanabhan > Fix For: 0.21.0 > > Attachments: hadoop-4927-v1.patch, hadoop-4927-v2.patch, hadoop-4927.patch > > > When OutputFormat.getRecordWriter is invoked, a part file is created on the output filesystem. But the created RecordWriter is not used until the OutputCollector.collect call is made by the task (user's code). This results in empty part files even if the OutputCollector.collect is never invoked by the corresponding tasks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.