hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Sorting output data on value
Date Fri, 22 Feb 2008 07:01:06 GMT

But this only guarantees that the results will be sorted within each
reducers input.  Thus, this won't result in getting the results sorted by
the reducers output value.


On 2/21/08 8:40 PM, "Owen O'Malley" <oom@yahoo-inc.com> wrote:

> 
> On Feb 21, 2008, at 5:47 PM, Ted Dunning wrote:
> 
>> It may be sorted within the output for a single reducer and,
>> indeed, you can
>> even guarantee that it is sorted but *only* by the reduce key.  The
>> order
>> that values appear will not be deterministic.
> 
> Actually, there is a better answer for this. If you put both the
> primary and secondary key into the key, you can use
> JobConf.setOutputValueGroupingComparator to set a comparator that
> only compares the primary key. Reduce will be called once per a
> primary key, but all of the values will be sorted by the secondary key.
> 
> See http://tinyurl.com/32gld4
> 
> -- Owen


Mime
View raw message