Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 45750 invoked from network); 30 Jan 2011 10:43:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Jan 2011 10:43:59 -0000 Received: (qmail 35458 invoked by uid 500); 30 Jan 2011 10:43:58 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 35174 invoked by uid 500); 30 Jan 2011 10:43:56 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 35166 invoked by uid 99); 30 Jan 2011 10:43:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Jan 2011 10:43:55 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of qwertymaniac@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Jan 2011 10:43:48 +0000 Received: by fxm2 with SMTP id 2so5190414fxm.35 for ; Sun, 30 Jan 2011 02:43:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=zfpQtrlBtPOqZIsILLeCBldd1kExNCMHrA/Jou6kZSo=; b=BXEEjIAmgnLRZnJrHJyiGm/TY26urypyEFVCEa/mc2XOj7JpIE4o3Rx/m4a5T0qHQN 6n9UIqAz2/9NKqlPSi0PHrha3j4r8uUS93NkEjgfpbl6ICQjek9VCHhnl2z8ux13D2No WkOyDvzujwE4AtGG0UKY1Di8p/KhWv6pnOUpY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=qkXviXdLccLaHM9tc/GW4dllMAoKetH/j8sdeIq9lYwbCF7+nyW6dxanD8r7dChO91 1MhH/meZ5EXMlmrwkN3BuIZK2bMslijepU2dtxJrNBgMOUhiEdxc1hUulGQuJhOHhIc2 HtgQNJPbXZ8xcDzQ1TNuN0A9HwTYHl1wl6pM8= Received: by 10.223.85.204 with SMTP id p12mr4581937fal.146.1296384208458; Sun, 30 Jan 2011 02:43:28 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.124.200 with HTTP; Sun, 30 Jan 2011 02:41:17 -0800 (PST) In-Reply-To: References: From: Harsh J Date: Sun, 30 Jan 2011 16:11:17 +0530 Message-ID: Subject: Re: sort the values in reduce side To: mapreduce-user@hadoop.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org The reduce's value iterator gives you a reference to a single object that's utilized across the reduce calls. If you must build an entire collection in memory to sort (You could explore how MapReduce itself can help sort with comparators/groupers, which is more efficient), use the clone() method of the value object to get a valid reference to hold in a list. On Sun, Jan 30, 2011 at 3:36 PM, exception wrote: > Hi, > > > > I am running a simple invert index generating program in hadoop which wil= l > emit every word in a text file as well as it=92s offsets. > > So the output key is Text and output value is a list of LongWritable. > > > > What I am trying to do is sort the offsets in reduce function. For each k= ey, > I put every value into a List and sort using Collections.sort(). > > > > This is the code sanp: > > offsetList.clear(); > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 for (LongWritable val : values) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 { > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 offsetList.add(val); > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 } > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Collections.sort(offsetList); > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 for (LongWritable offset : offsetList) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 { > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =85=85 > > } > > > > But it doesn=92t work. Looks like all the elements in offsetList have bee= n > overwritten by the smallest value in values. offsetList and values have t= he > same size. > > Can I sort the data in this way? > > > > Thanks. --=20 Harsh J www.harshj.com