hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Twensky <jim.twen...@gmail.com>
Subject Re: I want to group "similar" keys in the reducer.
Date Mon, 15 Mar 2010 21:25:05 GMT
Hi Raymond,

Take a look at http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setGroupingComparatorClass(java.lang.Class).
I think this is what you want. Also make sure to implement a custom
partitioner that only takes into account the first part of the key,
namely the KEY part. You can search for "Secondary Sort" and "Hadoop"
to see some tutorials on this topic.


2010/3/15 Gang Luo <lgpublic@yahoo.com.cn>:
> you need to define a pattern and implement you own partitioner so that all the similar
keys you want to group will go the the same reducer. At reduce side, you possibly need to
 implement secondary  sorting so that the keys you want to group are grouped in the sorted
input to reducer. For reduce method process on key at one time, you also need to maintain
a window to buffer all the keys being grouped.
> -Gang
> ----- 原始邮件 ----
> 发件人: Raymond Jennings III <raymondjiii@yahoo.com>
> 收件人: common-user@hadoop.apache.org
> 发送日期: 2010/3/15 (周一) 1:26:09 下午
> 主   题: I want to group "similar" keys in the reducer.
> Is it possible to override a method in the reducer so that similar keys will be grouped
together?  For example I want all keys of value "KEY1" and "KEY2" to merged together.  (My
reducer has a KEY of type TEXT.)  Thanks.

View raw message