hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: [VOTE] Direction for Hadoop development
Date Wed, 01 Dec 2010 19:11:41 GMT

On Nov 30, 2010, at 5:57 PM, Konstantin Shvachko wrote:

> This sounds like an important issue. But I personally don't  
> understand what
> exactly the controversy is, and therefore what is this vote about,  
> and what
> are the choices, if any.

The question is how the Hadoop project wants to move forward.

It was motivated by Doug's veto of HADOOP-6685, which was based on his  
personal decisions about how the project should go forward and not on  
anything that had been decided by the PMC.

These decisions are much more important to MapReduce, which is a  
framework, than HDFS which is a client/server model.

1. Should Hadoop include a user-facing library of useful code?

There has been a suggestion that user-facing library code, such as  
SequenceFile, TFile, DistCp, etc. should be deprecated and that Hadoop  
should allow third party projects like Avro to supply the user-facing  
library code that makes Hadoop usable. I think it is critical that we  
keep those components as part of Hadoop and extend them as the  
framework evolves. Users depend heavily on SequenceFile for storing  
their data in Hadoop and they should not  be deprecated as Doug has  
suggested.

2. Should MapReduce support non-Writables through the pipeline out of  
the box?

There has also been a discussion about whether we should support non- 
Writables natively. There is already library code in Avro that lets  
users use Avro types in a custom MapReduce API. A general MapReduce  
API that encompasses all of the serialization frameworks and does not  
lock users into a particular one is much more powerful.

Furthermore, making it convenient for the users, by including the  
plugins in the default configuration and class path, will enable the  
use of Avro, Thrift and ProtoBuf objects by people who would rather  
not focus on serialization. Avro and Writables should not be the only  
first class serializations that Hadoop supports by default.

3. Should a framework dependency on ProtoBuf be allowed?

Doug has added several framework dependences on Avro. The question is  
whether it is acceptable to use the ProtoBuf library in the framework.  
Avro is good for uses where there are a lot of objects of the same  
type. ProtoBuf is better for small number of objects. The question is  
whether Avro, JSON, and XML should be the only serialization libraries  
that are acceptable to use in the framework.

-- Owen
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message