hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arkady Borkovsky" <ark...@inktomi.com>
Subject Re: IdentityMapper
Date Thu, 20 Apr 2006 04:33:23 GMT
Eric has a great point.

It is pretty common to produce a set of records in map step, group them 
by key in reduce step and store for future use.
Whenever this data is used, it is already grouped by key and 
essentially ready for reduce.
Special casing for this may be a useful optimization.

-- ab

On Apr 19, 2006, at 5:34 PM, Eric Baldeschwieler wrote:

> might be cool to special case a reduce on sorted input.
> On Apr 18, 2006, at 12:28 PM, Doug Cutting wrote:
>> Stefan Groschupf wrote:
>>> what is the reason that each job that has no mapper defined runs the 
>>>  IdentityMapper?
>>> Handling different formats (as discussed) between mapping and  
>>> reducing is difficult.
>>> Having one job that just map in the one format and having another 
>>> job  that just reduce
>>> in a other format would be a nice workaround of the format problem  
>>> but the IdentityMapper makes this workaround impossible.
>> Stefan,
>> I don't understand the problem here.  Some map function is required 
>> for any data to make it to reduce.  IdentityMapper simply copies all 
>> map input without altering it.  How does this cause you problems?  
>> Would you prefer a NullMapper by default, that does nothing?  That 
>> would result in no output sent to reduce.
>> Thanks,
>> Doug

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message