hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [PROPOSAL] new subproject: Avro
Date Tue, 07 Apr 2009 16:16:39 GMT
Chad Walters wrote:
> I do think, however, that it will be very
> difficult for them to work together properly if the goal of code
> reuse by Thrift is not an explicit goal of Avro.

Code reuse is an explicit goal of Avro.  It's an open source project 
with public APIs intended to expose all of its functionality.

> I think that by working closely with the Thrift community directly in
> the Thrift code base, you will get several significant benefits.

It's not like I did not consider this approach, evolving Thrift to 
better support my needs.  In fact, I considered it for months before 
abandoning it.  I am very familiar with these arguments.

I am starting a new serialization project fully aware of the hazards.  I 
feel that, on balance, it is considerably simpler for Avro to be 
developed separately and that this will not adversely affect its users 
or its developer community.  You may disagree.  As volunteers here, we 
are both free to do as we choose.

> To support the second use case, dynamic schema interpretation, there
> is definitely significant new code to be written. Note that this code
> is essentially the same code wherever you are writing it.

This is a primary case for Avro.  Without it, Avro's a non-starter. 
And, as you note, this is new code that must be written for each 
platform.  That's primarily what Avro is.  Fitting this code into Thrift 
would only make it more complicated.

> Whatever
> work you are doing in Avro to be able to dynamically interpret JSON
> IDL could just be directly implemented in Thrift -- we would just
> define a JSON version of the Thrift IDL which would look a lot like
> Avro's IDL. To help further with interoperability we could make the
> Thrift compiler generate the JSON IDL from the Thrift IDL as another
> output target.

Sure, we could bolt Avro's features onto the side of Thrift, but that 
doesn't make it easier for me to deliver Avro's features nor any easier 
for folks to use them.  And Thrift doesn't need a second IDL format.  It 
already suffers from too many formats.  I seek a single format, not a 

> The basic upshot of the above is that it is not that hard to see how
> Avro could be directly integrated into Thrift if you were willing to
> entertain that option and I believe that you would see significant
> benefits that would more than offset the impact to your own ease of
> development about which you expressed concerns.

I am unlikely to implement it myself, as it does not address my needs.

> I am proposing that the IDL would
> only allow for field IDs to be omitted in the case where the schema
> was being interpreted dynamically -- no static bindings could be
> generated from IDL without fully specified field IDs. So if you are
> only interested in dynamic interpretation, you never have to look at
> or even think about field IDs. Does that in any way alter your stance
> here?

Not really.  It adds an "except on Tuesday" clause in the specification, 
which is not ideal.  In Avro we can generate static bindings without 
using field ids.

>> It could be a floor wax and a dessert topping!
> Love the SNL reference, but I don't think it is really appropos. My
> vision for Thrft with Avro's features folded in as a unified
> framework for cross-language serialization, covering a variety of use
> cases, is not jamming two completely heterogeneous things together. I
> can easily see wanting to take structures represented in one
> serialization format from disk and send them out over RPC. Thrift
> provides the means to do this kind of thing seemlessly, with formats
> appropriate to both use cases, rather than selecting a format that is
> good for one use case and so-so for the other.

I believe that the cost of supporting multiple formats is too high.  We 
differ on that point.  I don't think one-stop-shopping is appropriate 
here, but prefer to provide an ala-carte format.


View raw message