hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: [PROPOSAL] new subproject: Avro
Date Fri, 03 Apr 2009 04:28:53 GMT

On Apr 2, 2009, at 5:11 PM, Abhishek Verma wrote:

> I am a newbie here. Why not use something existing like protocol  
> buffers :
> http://code.google.com/p/protobuf/ which is open source and works  
> amazingly
> well.

There are two blockers for protocol buffers that make them suboptimal  
for Hadoop. They are:

1. Protocol buffers are open source, but the community isn't open.  
Google doesn't seem interested in getting patches from outside of  
itself. If we needed something changed in protocol buffers, we'd end  
up needing to fork the project to make any progress.

2. Protocol buffers (and thrift) encode the field names as id numbers.  
That means that if you read them into dynamic language like Python  
that it has to use the field numbers instead of the field names. In  
Avro, the field names are saved and there are no field ids.

A final point is that since the schema isn't inlined in Avro, the  
binary representation is much tighter than protocol buffers.

-- Owen

View raw message