Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 13455 invoked from network); 26 Apr 2007 01:37:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Apr 2007 01:37:36 -0000 Received: (qmail 12599 invoked by uid 500); 26 Apr 2007 01:37:43 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 12564 invoked by uid 500); 26 Apr 2007 01:37:42 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 12555 invoked by uid 99); 26 Apr 2007 01:37:42 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2007 18:37:42 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2007 18:37:35 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 805A2714079 for ; Wed, 25 Apr 2007 18:37:15 -0700 (PDT) Message-ID: <12328773.1177551435523.JavaMail.jira@brutus> Date: Wed, 25 Apr 2007 18:37:15 -0700 (PDT) From: "Runping Qi (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1216) Hadoop should support reduce none option In-Reply-To: <1234988.1175878352329.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Runping Qi updated HADOOP-1216: ------------------------------- Status: Patch Available (was: Open) > Hadoop should support reduce none option > ---------------------------------------- > > Key: HADOOP-1216 > URL: https://issues.apache.org/jira/browse/HADOOP-1216 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Runping Qi > Assigned To: Runping Qi > Attachments: patch_1216.txt > > > This has been a highly desired feature in streaming world and was asked occationally in the non-streaming side. > Streaming implemented a working (hacking) solution. But it also generates discrepency between hadoop > streaming/non-streaming model. It would be nice if Hadoop offers such a feature > that works both streaming and non-streaming. Owen and I discussed this a bit and here is the > general idea for further discussions/suggestions: > 1. Allows the user to specify reducer=none in jobconf. > 2. The user still can specify output format and output directory > 3. Each mapper will generate an output file in the specified directory. The naming convention can still be like part-xxxxxxxx > where xxxxxxxx is the map task number. > 4. The mapoutput collector of a mapper task will be a record writer on the > 5. The mapper will call output.collect() to write the output, thus the same mapper class can be > used, regardless reducer none is set or not. > When reducer is set to none for a job, there will be no mapoutput files writen on to local file system at all, > and no data shuffling between mappers and reducers. As a mapper of fact, the framework may choose > not to create reducers at all. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.