hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <...@yahoo-inc.com>
Subject [VOTE] Direction for Hadoop development
Date Mon, 29 Nov 2010 22:30:41 GMT
    Based on the discussion on HADOOP-6685, there is a pretty  
fundamental difference of opinion about how Hadoop should evolve. We  
need to figure out how the majority of the PMC wants the project to  
evolve to understand which patches move us forward. Please vote  
whether you approve of the following direction. Clearly as the author,  
I'm +1.

-- Owen

Hadoop has always included library code so that users had a strong  
foundation to build their applications on without needing to  
continually reinvent the wheel. This combination of framework and  
powerful library code is a common pattern for successful projects,  
such as Java, Lucene, etc. Toward that end, we need to continue to  
extend the Hadoop library code and actively maintain it as the  
framework evolves. Continuing support for SequenceFile and TFile,  
which are both widely used is mandatory. The opposite pattern of  
implementing the framework and letting each distribution add the  
required libraries will lead to increased community fragmentation and  
vendor lock in.

Hadoop's generic serialization framework had a lot of promise when it  
was introduced, but has been hampered by a lack of plugins other than  
Writables and Java serialization. Supporting a wide range of  
serializations natively in Hadoop will give the users new  
capabilities. Currently, to support Avro or ProtoBuf objects mutually  
incompatible third party solutions are required. It benefits Hadoop to  
support them with a common framework that will support all of them. In  
particular, having easy, out of the box support for Thrift, ProtoBufs,  
Avro, and our legacy serializations is a desired state.

As a distributed system, there are many instances where Hadoop needs  
to serialize data. Many of those applications need a lightweight,  
versioned serialization framework like ProtocolBuffers or Thrift and  
using them is appropriate. Adding dependences on Thrift and  
ProtocolBuffers to the previous dependence on Avro is acceptable.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message