hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mehmet Tepedelenlioglu <mehmets...@gmail.com>
Subject Re: Hadoop in Action Partitioner Example
Date Wed, 24 Aug 2011 00:09:50 GMT
Thanks, that is very useful to know.

On Aug 23, 2011, at 4:40 PM, Chris White wrote:

> Job.setGroupingComparatorClass will allow you to define a
> RawComparator class, in which you can only compare the K1 component of
> K. The Reduce sort will still sort all K's using the compareTo method
> of K, but will use the grouping comparator when deciding which values
> to pass to the reduce method.
> On Tue, Aug 23, 2011 at 7:25 PM, Mehmet Tepedelenlioglu
> <mehmetsino@gmail.com> wrote:
>> For those of you who has the book, on page 49 there is a custom partitioner example.
It basically describes a situation where the map emits <K,V>, but the key is a compound
key like (K1,K2), and we want to reduce over K1s and not the whole of the Ks. This is used
as an example of a situation where a custom partitioner should be written to hash over K1
to send the right keys to the same reducers. But as far as I know, although this would partition
the keys correctly (send them to the correct reducers), the reduce function would still be
called (grouped under) with the original keys K, not yielding the desired results. The only
way of doing this that I know of is to create a new WritableComparable, that carries all of
K, but only uses K1 for hash/equal/compare methods, in which case you would not need to write
your own partitioner anyways. Am I misinterpreting something the author meant, or is there
something I don't know going on? It would have been sweet if I could accomplish all that with
just the partitioner. Either I am misunderstanding something fundamental, or I am misunderstanding
the example's intention, or there is something wrong with it.
>> Thanks,
>> Mehmet

View raw message