hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [VOTE] Direction for Hadoop development
Date Mon, 29 Nov 2010 23:14:28 GMT

First, I don't see the yes/no issue that you'd like us to vote on here. 
  We vote on patches.  We vote on releases.  We vote on committers.  We 
don't vote on a project direction statement.  Rather folks present 
plans, others may present their conflicting concerns, and we need to get 
these to meet in order to make progress on a particular issue.

I too support continuing support for SequenceFile.

I too support adding flexible serialization APIs to MapReduce.

I do not support extending SequenceFile's format in substantial ways.  A 
proliferation of expressively equivalent yet incompatible file formats 
hinders the interoperable evolution of the Hadoop ecosystem.

I do not support adding new dependencies to the classpath of MapReduce 
user tasks.  We want to provide as much flexibility to user code as 
possible.  The more libraries the system includes the greater the 
potential for version conflicts.  As the Hadoop ecosystem expands, 
MapReduce should seek primarily to be an efficient, reliable kernel, not 
an extensive library of tools.

So I agree with some of your points, but not with others.



On 11/29/2010 02:30 PM, Owen O'Malley wrote:
> All,
> Based on the discussion on HADOOP-6685, there is a pretty fundamental
> difference of opinion about how Hadoop should evolve. We need to figure
> out how the majority of the PMC wants the project to evolve to
> understand which patches move us forward. Please vote whether you
> approve of the following direction. Clearly as the author, I'm +1.
> -- Owen
> Hadoop has always included library code so that users had a strong
> foundation to build their applications on without needing to continually
> reinvent the wheel. This combination of framework and powerful library
> code is a common pattern for successful projects, such as Java, Lucene,
> etc. Toward that end, we need to continue to extend the Hadoop library
> code and actively maintain it as the framework evolves. Continuing
> support for SequenceFile and TFile, which are both widely used is
> mandatory. The opposite pattern of implementing the framework and
> letting each distribution add the required libraries will lead to
> increased community fragmentation and vendor lock in.
> Hadoop's generic serialization framework had a lot of promise when it
> was introduced, but has been hampered by a lack of plugins other than
> Writables and Java serialization. Supporting a wide range of
> serializations natively in Hadoop will give the users new capabilities.
> Currently, to support Avro or ProtoBuf objects mutually incompatible
> third party solutions are required. It benefits Hadoop to support them
> with a common framework that will support all of them. In particular,
> having easy, out of the box support for Thrift, ProtoBufs, Avro, and our
> legacy serializations is a desired state.
> As a distributed system, there are many instances where Hadoop needs to
> serialize data. Many of those applications need a lightweight, versioned
> serialization framework like ProtocolBuffers or Thrift and using them is
> appropriate. Adding dependences on Thrift and ProtocolBuffers to the
> previous dependence on Avro is acceptable.

View raw message