hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (POWERSET)" <Jim.Keller...@microsoft.com>
Subject RE: [PROPOSAL] new subproject: Avro
Date Fri, 03 Apr 2009 22:10:17 GMT
> -----Original Message-----
> On Apr 2, 2009, at 3:05 PM, Doug Cutting wrote:
> I propose we add a new Hadoop subproject for Avro, a serialization
> system.  My ambition is for Avro to replace both Hadoop's RPC and to
> be used for most Hadoop data files, e.g., by Pig, Hive, etc.
> Initial committers would be Sharad Agarwal and me, both existing
> Hadoop committers.  We are the sole authors of this software to date.
> The code is currently at:
> http://people.apache.org/~cutting/avro.git/
> To learn more:
> git clone http://people.apache.org/~cutting/avro.git/ avro
> cat avro/README.txt
> Comments?  Questions?
> Doug

After reading all the messages about Avro, I'm still not sure I understand
why we should invent "yet another wheel". There are a number of people in
the community who have significant investments in Thrift, and I have yet
to see a compelling argument for Avro over Thrift.

My understanding is that Thrift already supports multi-language bindings,
something the HBase community has been asking for, for some time.

It is also my understanding (based on the email thread) that Avro only
supports Java and python. That is a step backwards from Thrift.

It appears that Avro uses introspection heavily, which is expensive in
applications that require a high message rate.

So I guess my question is why Avro? I may be thick, but it seems to me
as if it is just another wheel of a different color. If I could see a
point by point comparison between Avro and Thrift I could be convinced
that Avro is the way to go. So far, I have not seen any compelling reason
to re-invent the wheel.


Jim Kellerman, Powerset (Live Search, Microsoft Corporation)

View raw message