Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 22037 invoked from network); 20 Apr 2007 23:25:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Apr 2007 23:25:20 -0000 Received: (qmail 51060 invoked by uid 500); 20 Apr 2007 23:25:25 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 50686 invoked by uid 500); 20 Apr 2007 23:25:24 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 50669 invoked by uid 99); 20 Apr 2007 23:25:24 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Apr 2007 16:25:24 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [207.126.228.150] (HELO rsmtp2.corp.yahoo.com) (207.126.228.150) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Apr 2007 16:25:17 -0700 Received: from explainfloorlx (explainfloor-lx.corp.yahoo.com [207.126.231.230]) by rsmtp2.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l3KNOqBB036440; Fri, 20 Apr 2007 16:24:52 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=from:to:cc:subject:date:message-id:mime-version: content-type:x-mailer:x-mimeole:thread-index; b=fiqZmxv9x43xcdZNPCzHLR/NNogWYPoUoTX2VMfUlD+JY9PrZTQL1/K66jBwCcad From: "Runping Qi" To: , Cc: Subject: Real use scenario of streaming with Reduce=None Date: Fri, 20 Apr 2007 16:24:52 -0700 Message-ID: <002b01c783a3$18d971c0$e6e77ecf@ds.corp.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_002C_01C78368.6C7A99C0" X-Mailer: Microsoft Office Outlook 11 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Thread-Index: AceDoxilrFmuxbMwRXmx+tsoqpMIvg== X-Virus-Checked: Checked by ClamAV on apache.org ------=_NextPart_000_002C_01C78368.6C7A99C0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit With HADOOP-1216, the framework will support reduce=none feature by setting numReduceTasks=0. If a map/reduce job set numReduceTasks=0, it will not create any reducer tasks. The mappers will not generate the map output files either. Rather, each mapper will generate one DFS file in the output dir specified for the job and save the output of the mapper to the file as a part of the final result. This behavior will be the same whether a job is streaming or non-streaming. I wonder whether this behavior serves all the need of the current stream job user community. If so, we can eliminate all the weird "features" currently hacked in streaming implementation, such as sending the output of mappers through a socket (i.e. useSingleSideOutputURI_ option). Thoughts? Runping ------=_NextPart_000_002C_01C78368.6C7A99C0--