hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: correct pattern for using setOutputValueGroupingComparator?
Date Tue, 06 Jan 2009 08:15:05 GMT



On 1/6/09 9:47 AM, "Meng Mao" <mengmao@gmail.com> wrote:

> Unfortunately, my team is on 0.15 :(. We are looking to upgrade to 0.18 as
> soon as we upgrade our hardware (long story).
> From comparing the 0.15 and 0.19 mapreduce tutorials, and looking at the
> 4545 patch, I don't see anything that seems majorly different about the
> MapReduce API?
> - There's a Partitioner that's used, but that seems optional?
> - I see that 0.19 still provides setOutputValueGroupingComparator; is the
> setGroupingComparatorClass in the patch from the 0.20 API?
> 
Yes, setGroupingComparator got defined in the new MapReduce API and is doing
the same thing.

> I have an associated question -- is it possible to use this
> GroupingComparator technique to perform essentially a one-to-many mapping?
> Let's say I have records like so:
> id_1  -   metadata
> id_2  -   metadata
> id_1  A  numbers
> id_2  B  numbers
> id_1  C  numbers
> 
> Would it be possible for a key,value pair for <"id_1, -", metadata> to map
> to both the groups for the keys "id_1, A" and "id_1, C" ?  The comparator
> seems easy to achieve; but I don't see multiple copies of a record being
> sent to multiple groups.  I know it's a bit unusual, but it would be useful
> for us to have this kind of wildcard behavior.
> 
Not that's not possible without changing your app to generate that many
records. So for example, in your map, you could output multiple records
corresponding to the wild-card records..
 
> Meng
> 
> 
> 
> On Mon, Jan 5, 2009 at 6:58 PM, Owen O'Malley <omalley@apache.org> wrote:
> 
>> This is exactly what the setOutputValueGroupingComparator is for. Take a
>> look at HADOOP-4545, for an example using the secondary sort. If you are
>> using trunk or 0.20, look at
>> src/examples/org/apache/hadoop/examples/SecondarySort.java. The checked in
>> example uses the new map/reduce api that was introduced in 0.20.
>> 
>> -- Owen
>> 



Mime
View raw message