Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 6396 invoked from network); 10 Apr 2007 17:20:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Apr 2007 17:20:54 -0000 Received: (qmail 27990 invoked by uid 500); 10 Apr 2007 17:20:59 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 27885 invoked by uid 500); 10 Apr 2007 17:20:59 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 27876 invoked by uid 99); 10 Apr 2007 17:20:59 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2007 10:20:59 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2007 10:20:52 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 761BE714071 for ; Tue, 10 Apr 2007 10:20:32 -0700 (PDT) Message-ID: <26547421.1176225632481.JavaMail.jira@brutus> Date: Tue, 10 Apr 2007 10:20:32 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1216) Hadoop should support reduce none option In-Reply-To: <1234988.1175878352329.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487830 ] Owen O'Malley commented on HADOOP-1216: --------------------------------------- I think this would be best represented as specifying JobConf.setNumReduceTasks(0), which would close HADOOP-357. In that case, sending the map output directly to the OutputFormat would be appropriate. If the user doesn't want output from their map, they can just use NullOutputFormat to swallow all outputs. It doesn't add any complexity to the config and fixes a current bug. *smile* > Hadoop should support reduce none option > ---------------------------------------- > > Key: HADOOP-1216 > URL: https://issues.apache.org/jira/browse/HADOOP-1216 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Runping Qi > > This has been a highly desired feature in streaming world and was asked occationally in the non-streaming side. > Streaming implemented a working (hacking) solution. But it also generates discrepency between hadoop > streaming/non-streaming model. It would be nice if Hadoop offers such a feature > that works both streaming and non-streaming. Owen and I discussed this a bit and here is the > general idea for further discussions/suggestions: > 1. Allows the user to specify reducer=none in jobconf. > 2. The user still can specify output format and output directory > 3. Each mapper will generate an output file in the specified directory. The naming convention can still be like part-xxxxxxxx > where xxxxxxxx is the map task number. > 4. The mapoutput collector of a mapper task will be a record writer on the > 5. The mapper will call output.collect() to write the output, thus the same mapper class can be > used, regardless reducer none is set or not. > When reducer is set to none for a job, there will be no mapoutput files writen on to local file system at all, > and no data shuffling between mappers and reducers. As a mapper of fact, the framework may choose > not to create reducers at all. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.