hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Clark <kevin.cl...@gmail.com>
Subject Re: [PROPOSAL] new subproject: Avro
Date Tue, 07 Apr 2009 00:17:57 GMT
Hi Doug,

On Mon, Apr 6, 2009 at 12:12 PM, Doug Cutting <cutting@apache.org> wrote:
> Chad Walters wrote:
>> -- You suggest that there is not a lot in Thrift that Avro can
>> leverage. I think you may be overlooking the fact that Thrift has a
>> user base and a community of developers who are very interested in
>> issues of cross-language data serialization and interoperability.
> I meant that in terms of common code, not coders.  Coders can belong to more
> than one community but code should generally not.  Hadoop Core has become a
> sprawling community that we're trying to split.  It's more productive to
> have have more, small communities than few large ones.  A project needs a
> handful of active developers, but too many and it becomes ungainly.  So, if
> it's technically possible for a codebase to be distinct, and it can attract
> enough active developers to sustain itself, that is a preferable structure.

I agree with you in general, but cross language libraries require
larger communities than other projects. It's non-trivial to gather
groups of coders to support each language the project chooses to
include. Right now Thrift has some level of support for a dozen
languages. We've been really very active in the last several months,
and devs have come out of the woodwork to extend their favorite
language(s) binding(s). The overhead for those people (or some
equivalent group) to pay attention to another mailing list, another
bug tracker, another irc channel, and another community isn't trivial.
I understand that developing the code itself may be more convenient
for some, but I think that the community that supports the code is
what really counts. If we can share that, and still achieve our goals,
I think we'll be better off.

Of course, this assumes that one of the primary goals of Avro is to be
cross language. Is that the case, or have I misunderstood?

> Avro has unions and a null type, while Thrift does not.  Does Thrift support
> recursive data structures?

We don't support recursive data structures. We do, however, have a
ticket open where we're discussing union support (THRIFT-409).

In your post you talk about the problems associated with supporting
multiple serialization formats. One of the things I like about Thrift
is that even though Thrift supports many different things, application
developers aren't at all obligated to. In fact, I don't expect anyone
does. It would be perfectly reasonable for Hadoop to specify that they
use the Avro data format for transmissions, and the cross language
library to provide the API could be Thrift. I think you said something
similar in your post, but if not please do clarify.

On the "names vs field ids" issue:

I know that the Ruby and Java Thrift libraries provide name-based
access to this information, and know of no restriction that would keep
the others from doing the same. It's just a matter of a little code.

>> Consider an alternative: making Avro more like a sub-project of
>> Thrift or just implementing it directly in Thrift.
> I looked into changing Thrift to support Avro's features, and it was very
> messy.  Perhaps someone else could do this more easily.
> Building Avro as a part of Thrift would take considerably more effort for me
> and I think offer little more than it does separately.  If you feel
> differently, you are free to fork Avro, start a competitor, provide patches
> that integrate it into Thrift, or whatever.

I'd again like to appeal to you that it's the community that's harder
to develop than the code, and we've got one already. I also don't see
the implementation being especially difficult, but maybe we're looking
at different information. I'd be happy to talk with you about it if
you're open to the idea.

The goals of Avro seem to be consistent with the goals of each of
Thrift's contributors who have developed a new protocol. We can
already offer the things you've stated you don't want to develop, and
I think we've got a lot more to gain working together than we do
working separately.

That being said, I'm fairly confident we'll be providing an Avro
protocol on our own at some point if you're not interested in working
together. But I think if we go down that path we're doing a disservice
to users of both Thrift and Avro.

Kevin Clark

View raw message