hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: sort the values in reduce side
Date Sun, 30 Jan 2011 10:41:17 GMT
The reduce's value iterator gives you a reference to a single object
that's utilized across the reduce calls. If you must build an entire
collection in memory to sort (You could explore how MapReduce itself
can help sort with comparators/groupers, which is more efficient), use
the clone() method of the value object to get a valid reference to
hold in a list.

On Sun, Jan 30, 2011 at 3:36 PM, exception <exception@taomee.com> wrote:
> Hi,
>
>
>
> I am running a simple invert index generating program in hadoop which will
> emit every word in a text file as well as it’s offsets.
>
> So the output key is Text and output value is a list of LongWritable.
>
>
>
> What I am trying to do is sort the offsets in reduce function. For each key,
> I put every value into a List and sort using Collections.sort().
>
>
>
> This is the code sanp:
>
> offsetList.clear();
>
>             for (LongWritable val : values)
>
>             {
>
>                 offsetList.add(val);
>
>             }
>
>             Collections.sort(offsetList);
>
>
>
>
>
>             for (LongWritable offset : offsetList)
>
>                             {
>
>                                      ……
>
> }
>
>
>
> But it doesn’t work. Looks like all the elements in offsetList have been
> overwritten by the smallest value in values. offsetList and values have the
> same size.
>
> Can I sort the data in this way?
>
>
>
> Thanks.



-- 
Harsh J
www.harshj.com

Mime
View raw message