hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <shv.had...@gmail.com>
Subject Re: [VOTE] Direction for Hadoop development
Date Tue, 07 Dec 2010 11:27:41 GMT
It really takes time to understand the issue. I will spend more time reading
through it.

So far I feel that we need to distinguish between
a) issues that define the general direction for the project, and
b) the specifics of the implementation proposed by Owen, including decisions
induced by that implementation.

The main contradictory issue on which Owen and Doug disagree (other people
as well) is whether
Hadoop should support multiple serializations or be based on one designated
This is a defining general direction a-issue. I believe this is vote-able.

The question of introducing dependency on ProtoBuf is a b-issue, as it can
be implemented differently.
Say with "pluggable" APIs as Tom proposed. This is probably a consensus-type

Looks to me if we decide on multiple vs designated serializations, some
b-issues may
be automatically ruled out or in.


P.S. We used to have a tradition of presenting design documents before
introducing such big changes.
I believe a discussion of a design doc would have reduced tensions we face

On Mon, Dec 6, 2010 at 9:16 AM, Owen O'Malley <oom@yahoo-inc.com> wrote:

> On Dec 1, 2010, at 11:11 AM, Owen O'Malley wrote:
> All,
>   We really need some guidance on the general direction for the project.
> Please comment and/or vote. If no one cares, then I'll probably commit it to
> Yahoo's internal branch.
> -- Owen
>  The question is how the Hadoop project wants to move forward.
>> It was motivated by Doug's veto of HADOOP-6685, which was based on his
>> personal decisions about how the project should go forward and not on
>> anything that had been decided by the PMC.
>> These decisions are much more important to MapReduce, which is a
>> framework, than HDFS which is a client/server model.
>> 1. Should Hadoop include a user-facing library of useful code?
>> There has been a suggestion that user-facing library code, such as
>> SequenceFile, TFile, DistCp, etc. should be deprecated and that Hadoop
>> should allow third party projects like Avro to supply the user-facing
>> library code that makes Hadoop usable. I think it is critical that we keep
>> those components as part of Hadoop and extend them as the framework evolves.
>> Users depend heavily on SequenceFile for storing their data in Hadoop and
>> they should not  be deprecated as Doug has suggested.
>> 2. Should MapReduce support non-Writables through the pipeline out of the
>> box?
>> There has also been a discussion about whether we should support
>> non-Writables natively. There is already library code in Avro that lets
>> users use Avro types in a custom MapReduce API. A general MapReduce API that
>> encompasses all of the serialization frameworks and does not lock users into
>> a particular one is much more powerful.
>> Furthermore, making it convenient for the users, by including the plugins
>> in the default configuration and class path, will enable the use of Avro,
>> Thrift and ProtoBuf objects by people who would rather not focus on
>> serialization. Avro and Writables should not be the only first class
>> serializations that Hadoop supports by default.
>> 3. Should a framework dependency on ProtoBuf be allowed?
>> Doug has added several framework dependences on Avro. The question is
>> whether it is acceptable to use the ProtoBuf library in the framework. Avro
>> is good for uses where there are a lot of objects of the same type. ProtoBuf
>> is better for small number of objects. The question is whether Avro, JSON,
>> and XML should be the only serialization libraries that are acceptable to
>> use in the framework.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message