hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [PROPOSAL] new subproject: Avro
Date Fri, 03 Apr 2009 18:37:51 GMT
Bryan Duxbury wrote:
> It's not actually a different data format, is it? You're saying that the 
> user wouldn't specify the field IDs, but you'd fundamentally still use 
> field ids for compactness and the like.

Field ids are not present in Avro data except in the schema.  A record's 
fields are serialized in the order that the fields occur in the records 
schema, with no per-field annotations whatsoever.  For example, a record 
that contains a string and an int is serialized simply as a string 
followed by an int, nothing before, nothing between and nothing after. 
So, yes, it is a different data format.

> The bottom line is that I would love to see greater cooperation between 
> Hadoop and Thrift. Unless it's impossible or impractical for Thrift to 
> be useful here, I think we'd be willing to work towards Hadoop's needs.

Perhaps Thrift could be augmented to support Avro's JSON schemas and 
serialization.  Then it could interoperate with other Avro-based 
systems.  But then Thrift would have yet another serialization format, 
that every language would need to implement for it to be useful...

Avro will only ever have one serialization format.  Thrift fundamentally 
standardizes an API, not a data format.  Avro fundamentally is a data 
format specification, like XML.  Thrift could implement this 
specification.  The Avro project includes reference implementations, but 
the format is intended to be simple enough and the specification stable 
enough that others might reasonably develop alternate, independent 
implementations.

Doug

Mime
View raw message