hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Tofel <b...@archive.org>
Subject Re: Reduce function
Date Tue, 19 Oct 2010 00:58:05 GMT
Whoops, just re-read your message, and see you may be asking about 
targeting a reduce callback function, not a reduce task..

If that's the case, I'm not sure I understand what your "bit/tag" is 
for, and what you're trying to do with it. Can you provide a concrete 
example (not necessarily code) of some keys which need to group together?

Is there a way to embed the "bit" within the value, so keys are always 
common?

If you really need to fake out the system so different keys arrive in 
the same reduce, you might be able to do it with a combination of:

org.apache.hadoop.mapreduce.Job

.setSortComparatorClass()
.setGroupingComparatorClass()
.setPartitionerClass()

Brad

On 10/18/2010 05:41 PM, Brad Tofel wrote:
> The "Partitioner" implementation used with your job should define 
> which reduce target receives a given map output key.
>
> I don't know if an existing Partitioner implementation exists which 
> meets your needs, but it's not a very complex interface to develop, if 
> nothing existing works for you.
>
> Brad
>
> On 10/18/2010 04:43 PM, Shi Yu wrote:
>> How many tags you have? If you have several number of tags, you'd 
>> better create a Vector class to hold those tags. And define sum 
>> function to increment the values of tags. Then the value class should 
>> be your new Vector class. That's better and more decent than the 
>> Textpair approach.
>>
>> Shi
>>
>> On 2010-10-18 5:19, Matthew John wrote:
>>> Hi all,
>>>
>>> I had a small doubt regarding the reduce module. What I understand 
>>> is that
>>> after the shuffle / sort phase , all the records with the same key 
>>> value
>>> goes into a reduce function. If thats the case, what is the 
>>> attribute of the
>>> Writable key which ensures that all the keys go to the same reduce ?
>>>
>>> I am working on a reduce side Join where I need to tag all the keys 
>>> with a
>>> bit which might vary but still want all those records to go into same
>>> reduce. In Hadoop the Definitive Guide, pg. 235 they are using  
>>> TextPair for
>>> the key. But I dont understand how the keys with different tag 
>>> information
>>> goes into the same reduce.
>>>
>>> Matthew
>>>
>>
>>
>


Mime
View raw message