hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: [VOTE] Direction for Hadoop development
Date Tue, 14 Dec 2010 03:08:56 GMT

On Dec 7, 2010, at 2:37 PM, Roy T. Fielding wrote:

>> The proposal is to change the extension mechanism incompatibly with  
>> unclear benefits,
>
> Good, these are technical reasons.  The benefits can be cleared by  
> docs.
> By incompatible, I assume you mean forward-compatibility of old  
> versions
> of Hadoop reading newer files.  Can we fix that by having the new
> implementation use the old file format by default until it is  
> configured
> to use one of the new interfaces for writing?


There are two goals here. The first is to extend the serialization  
plugin interface. The current patch does things completely compatibly  
including a shim that will use the previous plugins to satisfy the new  
API. The benefits are also clear. Avro serialization is possible when  
it wasn't previously. It also provides a wide range of opportunities  
that weren't previously possible.

The file format was changed as a demonstration that the serialization  
interface was useful and complete. The file change is also backwards  
compatible and will automatically read old versions of the file. Old  
versions of the code will complain with an error message if they are  
given a new version. This is exactly the pattern we have used in the  
past.

So, no there are no technical issues with the patch as it stands.

> You keep referring to the kernel as if it were a product.  I don't see
> a kernel product in the list of things released by Apache Hadoop.

The kernel is a very loosely defined concept. Utilities that are  
currently used by the framework are "kernel" others are just used by  
the users. Some classes are clearly kernel and some are clearly  
library, but there are some such as BooleanWritable that aren't  
obvious. It would take a fair amount of work and likely some  
duplication to segregate out the library code. I also worry that  
creating such a project would make Hadoop less useful out of the box  
and decrease the value of the Apache release of Hadoop.

But back to the original point. Doug's (and Tom's) veto was based on:
1. Modification to SequenceFile.
2. It introduces a dependence on Protocol Buffers.

There was strong consensus that SequenceFile was required and should  
be updated as the framework evolves. The second is not a technical  
reason. I believe that the entire veto should be considered invalid.

-- Owen

Mime
View raw message