hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: correct pattern for using setOutputValueGroupingComparator?
Date Tue, 06 Jan 2009 08:15:05 GMT

On 1/6/09 9:47 AM, "Meng Mao" <mengmao@gmail.com> wrote:

> Unfortunately, my team is on 0.15 :(. We are looking to upgrade to 0.18 as
> soon as we upgrade our hardware (long story).
> From comparing the 0.15 and 0.19 mapreduce tutorials, and looking at the
> 4545 patch, I don't see anything that seems majorly different about the
> MapReduce API?
> - There's a Partitioner that's used, but that seems optional?
> - I see that 0.19 still provides setOutputValueGroupingComparator; is the
> setGroupingComparatorClass in the patch from the 0.20 API?
Yes, setGroupingComparator got defined in the new MapReduce API and is doing
the same thing.

> I have an associated question -- is it possible to use this
> GroupingComparator technique to perform essentially a one-to-many mapping?
> Let's say I have records like so:
> id_1  -   metadata
> id_2  -   metadata
> id_1  A  numbers
> id_2  B  numbers
> id_1  C  numbers
> Would it be possible for a key,value pair for <"id_1, -", metadata> to map
> to both the groups for the keys "id_1, A" and "id_1, C" ?  The comparator
> seems easy to achieve; but I don't see multiple copies of a record being
> sent to multiple groups.  I know it's a bit unusual, but it would be useful
> for us to have this kind of wildcard behavior.
Not that's not possible without changing your app to generate that many
records. So for example, in your map, you could output multiple records
corresponding to the wild-card records..
> Meng
> On Mon, Jan 5, 2009 at 6:58 PM, Owen O'Malley <omalley@apache.org> wrote:
>> This is exactly what the setOutputValueGroupingComparator is for. Take a
>> look at HADOOP-4545, for an example using the secondary sort. If you are
>> using trunk or 0.20, look at
>> src/examples/org/apache/hadoop/examples/SecondarySort.java. The checked in
>> example uses the new map/reduce api that was introduced in 0.20.
>> -- Owen

View raw message