hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Paranjpye <sparanj...@yahoo.com>
Subject Re: [PROPOSAL] new subproject: Avro
Date Fri, 03 Apr 2009 20:27:10 GMT


While protocol buffers and thrift have similar goals. Avro takes a different approach to schema
evolution and reconciliation. I feel that Avros tighter layout of data and schema management
is better suited for many of Hadoop and Pigs use cases for large data sets/tables on HDFS.
Field ids start to matter when milions of objects have hundreds of fields each. There is,
of course, the storage overhead. Schema management becomes hard especially if there are cases
where field ids need to be assigned manually.

----- Original Message ----
From: Doug Cutting <cutting@apache.org>
To: general@hadoop.apache.org
Sent: Thursday, April 2, 2009 3:05:08 PM
Subject: [PROPOSAL] new subproject: Avro

I propose we add a new Hadoop subproject for Avro, a serialization system.  My ambition is
for Avro to replace both Hadoop's RPC and to be used for most Hadoop data files, e.g., by
Pig, Hive, etc.

Initial committers would be Sharad Agarwal and me, both existing Hadoop committers.  We are
the sole authors of this software to date.

The code is currently at:


To learn more:

git clone http://people.apache.org/~cutting/avro.git/ avro
cat avro/README.txt

Comments?  Questions?



View raw message