hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amar Kamat <ama...@yahoo-inc.com>
Subject Re: Single output file per reduce key?
Date Thu, 17 Jan 2008 07:11:14 GMT
Myles Grant wrote:
> I would like the values for a key to exist in a single file, and only 
> the values for that key.
Reducer.reduce() gets invoked once per key, i.e just once per key along 
with all the values associated with it.
Reducer.reduce(key,<value1, value2, value3 ....);
So what I suggested should help you generate one file per key. Since you 
have an iterator over all the values associated with that key you don't 
have to do much and since the input to the reducer is sorted you can be 
sure that all the values for the key are passed to Reducer.reduce().
> Each reduced key/value would get its own file.  If I understand 
> correctly, all output of the reducers is written to a single file.
> -Myles
> On Jan 16, 2008, at 9:29 PM, Amar Kamat wrote:
>> Hi,
>> Why couldn't you just write this logic in your reducer class. The 
>> reduce [reduceClass.reduce()] method is invoked with a key and an 
>> iterator over the values associated with the key. You can simply dump 
>> the values into a file. Since the input to the reducer is sorted you 
>> can simply dump the values to a file i.e no bookkeeping is required. 
>> I think this is what you wanted. no?
>> Myles Grant wrote:
>>> Hello,
>>> I'd like me reduce tasks to each output a single file per key, 
>>> containing the value. Each file would be named with the key.  It 
>>> appears that I need to (at least) create a new OutputFormat and 
>>> possible a RecordWriter.  As doing this would likely involve a lot 
>>> of trial and error on my part, I was curious if someone had 
>>> implemented this already and would like to share.  I will be needing 
>>> both versions that write text files and binary files eventually.
>>> Short a full existing implementation that I can steal, how about 
>>> some hints?
>>> Cheers,
>>> Myles

View raw message