hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [VOTE] Direction for Hadoop development
Date Wed, 08 Dec 2010 18:12:09 GMT
On 12/07/2010 02:37 PM, Roy T. Fielding wrote:
> Good, these are technical reasons.  The benefits can be cleared by docs.
> By incompatible, I assume you mean forward-compatibility of old versions
> of Hadoop reading newer files.  Can we fix that by having the new
> implementation use the old file format by default until it is configured
> to use one of the new interfaces for writing?


> You keep referring to the kernel as if it were a product.  I don't see
> a kernel product in the list of things released by Apache Hadoop.

The line is fairly clear.  The kernel is the daemons plus the framework 
code that invokes user code.  The set of pluggable user implementations 
is fairly small: InputFormat, OutputFormat, Mapper, Reducer, RawComparator.

SequenceFile was originally part of the kernel but is now only used by 
user-level InputFormats and OutputFormats.

> If there were such a product, then it would make sense for Apache Hadoop
> to also release ancillary products for common libraries, test frameworks,
> and modular storage interfaces.  Rearchitecting the Hadoop product suite
> into such a logical arrangement would make sense, and after such an
> architecture is put into place then "keeping the kernel simple" would
> be a reason to veto a change to the kernel.

Such a re-arrangement has been proposed but not completed.  Relevant 
issues are MAPREDUCE-1638, MAPREDUCE-1453, and MAPREDUCE-1700.  It 
mostly involves build issues; the architecture already largely supports 
the distinction.

>> Tom long ago provided patches showing how the existing
>> configuration system can provide equivalent extension
>> implementations outside of the kernel with no incompatible changes.
>> (MAPREDUCE-376 and MAPREDUCE-377)
> They both seem to be active and unfinished.  If they are equivalent fixes
> to the same problem, then I suggest applying them to a branch, documenting
> how they work, and then agreeing to have a bake-off.  A bake-off is a
> decision made by performance and feature-completeness as an objective
> way to resolve an impasse due to mutually exclusive vetoes.  All sides agree
> to drop the veto and accept whichever performs best, by majority decision.

A bake-off could be a good way to resolve this.  Performance differences 
would not likely be measurable, but folks might examine user programs 
and consider compatibility and support implications and vote accordingly.

> All action items can be voted on.  What we are talking about here is a
> short term plan, and it is listed as a type of action item under
> changes to products.

Then voting on specific short-term actions might be a good way to 
resolve this.

Some specific short-term questions we might vote on:

1. Should we add specific versions of Protocol Buffers and Thrift to the 
classpath of every MapReduce program?

2. Should SequenceFile be forward-compatible, i.e., if an existing 
program that stores Writables in a SequenceFile is run against the new 
version, should the old version still be able to read the output of the 
new version?

3. Should we continue support a specified interchange format and/or data 
model for configuration data, or should configurations rather be opaque 
binary data?  An interchange format might be JSON.  An interchange data 
model might  Map<String,Value> where values can be strings, booleans, 
numbers, bytes or nested configuration data, defined by a standard API 
that all configurable items would support.  A specified format or model 
would permit things like using -D to set configuration options and 
permit generic interaction with external configuration systems.  With 
opaque binary configurations, each configurable item would provide its 
own API and would require specific new code that calls this API for each 
parameter that could be set with -D or from an external configuration 

>>> They are also subject to veto if and only if they
>>> are to be applied to the current release branch (or a released branch).
>> Owen intends to merge this patch to a release branch.
> Right.

So votes on action items would be simple majority if they're not 
intended to be merged to a release branch, and vetoable if they are?  Is 
that right?


View raw message