hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <...@yahoo-inc.com>
Subject Re: [VOTE] Direction for Hadoop development
Date Mon, 06 Dec 2010 17:16:00 GMT

On Dec 1, 2010, at 11:11 AM, Owen O'Malley wrote:

    We really need some guidance on the general direction for the  
project. Please comment and/or vote. If no one cares, then I'll  
probably commit it to Yahoo's internal branch.

-- Owen

> The question is how the Hadoop project wants to move forward.
> It was motivated by Doug's veto of HADOOP-6685, which was based on  
> his personal decisions about how the project should go forward and  
> not on anything that had been decided by the PMC.
> These decisions are much more important to MapReduce, which is a  
> framework, than HDFS which is a client/server model.
> 1. Should Hadoop include a user-facing library of useful code?
> There has been a suggestion that user-facing library code, such as  
> SequenceFile, TFile, DistCp, etc. should be deprecated and that  
> Hadoop should allow third party projects like Avro to supply the  
> user-facing library code that makes Hadoop usable. I think it is  
> critical that we keep those components as part of Hadoop and extend  
> them as the framework evolves. Users depend heavily on SequenceFile  
> for storing their data in Hadoop and they should not  be deprecated  
> as Doug has suggested.
> 2. Should MapReduce support non-Writables through the pipeline out  
> of the box?
> There has also been a discussion about whether we should support non- 
> Writables natively. There is already library code in Avro that lets  
> users use Avro types in a custom MapReduce API. A general MapReduce  
> API that encompasses all of the serialization frameworks and does  
> not lock users into a particular one is much more powerful.
> Furthermore, making it convenient for the users, by including the  
> plugins in the default configuration and class path, will enable the  
> use of Avro, Thrift and ProtoBuf objects by people who would rather  
> not focus on serialization. Avro and Writables should not be the  
> only first class serializations that Hadoop supports by default.
> 3. Should a framework dependency on ProtoBuf be allowed?
> Doug has added several framework dependences on Avro. The question  
> is whether it is acceptable to use the ProtoBuf library in the  
> framework. Avro is good for uses where there are a lot of objects of  
> the same type. ProtoBuf is better for small number of objects. The  
> question is whether Avro, JSON, and XML should be the only  
> serialization libraries that are acceptable to use in the framework.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message