hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.
Date Sun, 02 Apr 2006 00:39:21 GMT
Sure there is something wrong with requiring extra map-reduce  
passes.  Without significant development it can be very expensive  
(shuffling, sorting and rewriting your whole output set can be a  
significant burden).  Pointlessly so, since the extension is clear,  
safe and easier to explain then the restriction.

I think we can all agree that a project goal is to keep the design as  
simple and focused as possible.  I'd find an argument against an  
extension based on those goals pretty compelling, but the lack of a  
feature in a paper from google doesn't seem like a compelling reason  
to reject something.  The hadoop approach to many decisions varies  
from google's, this is not a bad thing.

I can not think of a case where this proposed extension complicates  
code or reduces compressibility.  Since it is backwards compatible  
with your desired API, purists can simply ignore the option.

On Apr 1, 2006, at 9:29 AM, Andrew McNabb wrote:

> On Sat, Apr 01, 2006 at 06:19:27PM +0100, Teppo Kurki (JIRA) wrote:
>>
>> My original post about the issue gives a simple case that would  
>> benefit from this: http://www.mail-archive.com/hadoop-user% 
>> 40lucene.apache.org/msg00073.html
>>
>
> This should be done in two map-reduce phases.  There's nothing wrong
> with running two phases (or 10,000).
>
> -- 
> Andrew McNabb
> http://www.mcnabbs.org/andrew/
> PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868


Mime
View raw message