Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 50981 invoked from network); 11 Jan 2009 13:24:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Jan 2009 13:24:35 -0000 Received: (qmail 95025 invoked by uid 500); 11 Jan 2009 13:24:29 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 94991 invoked by uid 500); 11 Jan 2009 13:24:29 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 94979 invoked by uid 99); 11 Jan 2009 13:24:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jan 2009 05:24:28 -0800 X-ASF-Spam-Status: No, hits=4.0 required=10.0 tests=DNS_FROM_OPENWHOIS,FORGED_YAHOO_RCVD,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jan 2009 13:24:19 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1LM0II-0004yQ-6t for core-user@hadoop.apache.org; Sun, 11 Jan 2009 05:23:58 -0800 Message-ID: <21399089.post@talk.nabble.com> Date: Sun, 11 Jan 2009 05:23:58 -0800 (PST) From: tienduc_dinh To: core-user@hadoop.apache.org Subject: Re: Merging reducer outputs into a single part-00000 file In-Reply-To: <7a8854060901102355o70471581s6d5f57eef907a4c@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: tienduc_dinh@yahoo.com References: <7a8854060901102355o70471581s6d5f57eef907a4c@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org part-00000 means, there is only one reduce task in your configuration. Hope, this helps. Tien Duc Dinh Jim Twensky wrote: >=20 > Hello, >=20 > The original map-reduce paper states: "After successful completion, the > output of the map-reduce execution is available in the R output =EF=AC=81= les (one > per reduce task, with =EF=AC=81le names as speci=EF=AC=81ed by the user).= " However, when > using Hadoop's TextOutputFormat, all the reducer outputs are combined in = a > single file called part-00000. I was wondering how and when this merging > process is done. When the reducer calls output.collect(key,value), is thi= s > record written to a local temporary output file in the reducer's disk and > then these local files (a total of R) are later merged into one single > file > with a final thread or is it directly written to the final output file > (part-00000)? I am asking this because I'd like to get an ordered sample > of > the final output data, ie. one record per every 1000 records or something > similar and I don't want to run a serial process that iterates on the > final > output file. >=20 > Thanks, > Jim >=20 >=20 --=20 View this message in context: http://www.nabble.com/Merging-reducer-outputs= -into-a-single-part-00000-file-tp21396867p21399089.html Sent from the Hadoop core-user mailing list archive at Nabble.com.