hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Porter <George.Por...@Sun.COM>
Subject Re: [PROPOSAL] new subproject: Avro
Date Fri, 03 Apr 2009 19:03:35 GMT

On Apr 3, 2009, at 11:37 AM, Doug Cutting wrote:
>>
>
> Field ids are not present in Avro data except in the schema.  A  
> record's fields are serialized in the order that the fields occur in  
> the records schema, with no per-field annotations whatsoever.  For  
> example, a record that contains a string and an int is serialized  
> simply as a string followed by an int, nothing before, nothing  
> between and nothing after. So, yes, it is a different data format.

While this representation would certainly be as compact as possible,  
wouldn't it prevent evolving the data structure over time?  One of the  
nice features of Google Protocol Buffers and Thrift is that you can  
evolve the set of fields over time, and older/newer clients can talk  
to older/newer services.  If the proposed Avro is evolvable, then  
perhaps I'm misunderstanding your statement about the lack of IDs in  
the serialized data.

I also agree with Bryan, in that it would be unfortunate to have two  
different Apache projects with overlapping goals.  Regardless of  
features, both protocol buffers and thrift have the advantage of being  
debugged in mission-critical production environments.

-George

Mime
View raw message