Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 8618 invoked from network); 15 Apr 2008 07:34:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Apr 2008 07:34:49 -0000 Received: (qmail 13719 invoked by uid 500); 15 Apr 2008 07:34:48 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 13697 invoked by uid 500); 15 Apr 2008 07:34:48 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 13688 invoked by uid 99); 15 Apr 2008 07:34:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 00:34:48 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 07:34:02 +0000 Received: from [192.168.1.103] (snvvpn1-10-72-73-c133.corp.yahoo.com [10.72.73.133]) by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id m3F7Xo93033663 for ; Tue, 15 Apr 2008 00:33:51 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:content-type:message-id: content-transfer-encoding:from:subject:date:to:x-mailer; b=FTeEon+5ACrgsvDy7G0gcjF0bQ+nZeY+Sc33EoTCFxIFgLob7UeU/b+snZ5ph6BW Mime-Version: 1.0 (Apple Message framework v753) In-Reply-To: <1b29507a0804141625s15a9784eh4350c7116e803cd2@mail.gmail.com> References: <1b29507a0804141625s15a9784eh4350c7116e803cd2@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <6BC2F9CE-0BFE-4B80-BF5D-09DFC360B917@yahoo-inc.com> Content-Transfer-Encoding: 7bit From: arkady borkovsky Subject: Re: Pre-sort value list in reduce Date: Tue, 15 Apr 2008 00:32:30 -0700 To: core-dev@hadoop.apache.org X-Mailer: Apple Mail (2.753) X-Virus-Checked: Checked by ClamAV on apache.org look at -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner --ab On Apr 14, 2008, at 4:25 PM, pi song wrote: > Dear people in Hadoop mailing list, > > Is there any way to control the value list in reduce (Key, List of > values) > to be sorted? or at least clusteringly sorted (containing clusters > of sorted > values e.g. 1,1,1,2,2,2,2,3,3,3, 1,1,1,1,1,1,2,2,2,2,3 > ,1,1,2,2,2,3,3,3,3,3,3,3) ? > I had a look at JobConf.setOutputValueGroupingComparator in javadoc > and I > think it might be the answer because I feel most of the time > grouping in > Hadoop is done by sort. Am I right? > > Can anyone help me? How about the performance impact of your solution? > > Thanks in advance, > Pi