hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: [PROPOSAL] new subproject: Avro
Date Fri, 03 Apr 2009 17:50:14 GMT
> With the schema in hand, you don't need to tag data with field  
> numbers or types, since that's all there in the schema.  So, having  
> the schema, you can use a simpler data format.

To a degree, we already have that in Thrift - we call it the  
DenseProtocol.

> Would you write parsers for Thrift's IDL in every language?  Or  
> would you use JSON, as Avro does, to avoid that?

When it comes to having a code-usable IDL for the schema, I'm totally  
pro-JSON.

> Once you're using a different IDL and a different data format,  
> what's shared with Thrift?  Fundamentally, those two things define  
> a serialization system, no?

It's not actually a different data format, is it? You're saying that  
the user wouldn't specify the field IDs, but you'd fundamentally  
still use field ids for compactness and the like. You may not use  
actual Thrift generated objects, but you could certainly use Binary  
or Compact protocol from Thrift to do all the writing to the wire.

You might also be able to use (or contribute to) Thrift's RPC-level  
stuff like server implementations. We have some respectable Java  
servers written, and if those aren't enough for your uses, I'd  
actually be really interested in seeing if we could generalize some  
of the Hadoop stuff to be useful within Thrift.

The bottom line is that I would love to see greater cooperation  
between Hadoop and Thrift. Unless it's impossible or impractical for  
Thrift to be useful here, I think we'd be willing to work towards  
Hadoop's needs.

-Bryan


Mime
View raw message