hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Meng Mao" <meng...@gmail.com>
Subject Re: correct pattern for using setOutputValueGroupingComparator?
Date Tue, 06 Jan 2009 04:17:07 GMT
Unfortunately, my team is on 0.15 :(. We are looking to upgrade to 0.18 as
soon as we upgrade our hardware (long story).
>From comparing the 0.15 and 0.19 mapreduce tutorials, and looking at the
4545 patch, I don't see anything that seems majorly different about the
MapReduce API?
- There's a Partitioner that's used, but that seems optional?
- I see that 0.19 still provides setOutputValueGroupingComparator; is the
setGroupingComparatorClass in the patch from the 0.20 API?

I have an associated question -- is it possible to use this
GroupingComparator technique to perform essentially a one-to-many mapping?
Let's say I have records like so:
id_1  -   metadata
id_2  -   metadata
id_1  A  numbers
id_2  B  numbers
id_1  C  numbers

Would it be possible for a key,value pair for <"id_1, -", metadata> to map
to both the groups for the keys "id_1, A" and "id_1, C" ?  The comparator
seems easy to achieve; but I don't see multiple copies of a record being
sent to multiple groups.  I know it's a bit unusual, but it would be useful
for us to have this kind of wildcard behavior.


On Mon, Jan 5, 2009 at 6:58 PM, Owen O'Malley <omalley@apache.org> wrote:

> This is exactly what the setOutputValueGroupingComparator is for. Take a
> look at HADOOP-4545, for an example using the secondary sort. If you are
> using trunk or 0.20, look at
> src/examples/org/apache/hadoop/examples/SecondarySort.java. The checked in
> example uses the new map/reduce api that was introduced in 0.20.
> -- Owen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message