Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 87131 invoked from network); 15 Jan 2009 00:34:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jan 2009 00:34:30 -0000 Received: (qmail 33670 invoked by uid 500); 15 Jan 2009 00:34:24 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 33631 invoked by uid 500); 15 Jan 2009 00:34:24 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 33620 invoked by uid 99); 15 Jan 2009 00:34:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jan 2009 16:34:24 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jim.twensky@gmail.com designates 209.85.218.20 as permitted sender) Received: from [209.85.218.20] (HELO mail-bw0-f20.google.com) (209.85.218.20) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2009 00:34:17 +0000 Received: by bwz13 with SMTP id 13so2446408bwz.5 for ; Wed, 14 Jan 2009 16:33:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=8hWX/IimHFA7p8BoYT8ov1GSfvtOiswJDJlhgAMu//s=; b=lDHBZlZ9Vhbj3+o1TBTvXy05twVWsB6iKmTQepylp2pGDdTL+mQv28SkFJNgYvQZpf Q6+N+gXDO/7gB/A78r4HFat8+KeyuaD/d59FIyBqc4ojX6SoDi76oGAf7wt9RXVhgPWz 6vf36Iq4KlkDcNoVRAav88HWIlqtTBmv2M468= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=P1DMDNz3fJQTaR9WNz01hR7x/CGtDBGyOwt8y6UNwAHUZIjkQsYtTQT2P02J4ESVrR DROuAxaZ07DTjmyCgLABeByAT/IxNEm/Md84SBjgaHQZ6jbX9vfFFe5GJ2dLPdmRsjk+ ZMpXmGXtCfd9jYTetjIggDWxRDk3zhmmiGBWc= Received: by 10.181.199.16 with SMTP id b16mr212004bkq.142.1231979635961; Wed, 14 Jan 2009 16:33:55 -0800 (PST) Received: by 10.181.142.8 with HTTP; Wed, 14 Jan 2009 16:33:55 -0800 (PST) Message-ID: <7a8854060901141633g4d47467fyc2784e99cbb10d13@mail.gmail.com> Date: Wed, 14 Jan 2009 18:33:55 -0600 From: "Jim Twensky" To: core-user@hadoop.apache.org Subject: Re: Merging reducer outputs into a single part-00000 file In-Reply-To: <94163217-AD61-4A07-9A1D-41738E873839@apache.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_13120_12945956.1231979635947" References: <7a8854060901102355o70471581s6d5f57eef907a4c@mail.gmail.com> <94163217-AD61-4A07-9A1D-41738E873839@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_13120_12945956.1231979635947 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Owen and Rasit, Thank you for the responses. I've figured that mapred.reduce.tasks was set to 1 in my hadoop-default xml and I didn't overwrite it in my hadoop-site.xml configuration file. Jim On Wed, Jan 14, 2009 at 11:23 AM, Owen O'Malley wrote: > On Jan 14, 2009, at 12:46 AM, Rasit OZDAS wrote: > > Jim, >> >> As far as I know, there is no operation done after Reducer. >> > > Correct, other than output promotion, which moves the output file to the > final filename. > > But if you are a little experienced, you already know these. >> Ordered list means one final file, or am I missing something? >> > > There is no value and a lot of cost associated with creating a single file > for the output. The question is how you want the keys divided between the > reduces (and therefore output files). The default partitioner hashes the key > and mods by the number of reduces, which "stripes" the keys across the > output files. You can use the mapred.lib.InputSampler to generate good > partition keys and mapred.lib.TotalOrderPartitioner to get completely sorted > output based on the partition keys. With the total order partitioner, each > reduce gets an increasing range of keys and thus has all of the nice > properties of a single file without the costs. > > -- Owen > ------=_Part_13120_12945956.1231979635947--