hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [PROPOSAL] new subproject: Avro
Date Mon, 13 Apr 2009 17:54:18 GMT
Ankur Goel wrote:
> How fast do we expect the new serialization system to be when it
> replaces existing serialization mechanism in Hadoop RPC?

I hope that Avro will make its first release this summer.  Sometime soon 
after, I hope that we can start moving Hadoop Core's trunk RPC onto 
Avro.  We may start developing an experimental version of Hadoop Core 
that uses Avro in a branch before Avro is released.  This is all 
speculative, of course.  Any detailed discussion of Hadoop Core's future 
belongs on the core-dev@ and of Avro's future on avro-dev@.

> A clear description of the existing bottlenecks and the performance
> goals for this system would help developers interested in
> contributing.

Adding Avro to Hadoop Core is not primarily about performance but rather 
about compatibility and security.

Hadoop's existing RPC is not a performance bottleneck, nor is HDFS's 
data transfer protocol.  However, currently, Hadoop requires that 
clients and servers must run the exact same version of code, since the 
existing RPC is not tolerant of protocol changes.  We'd like to change 
that, so that one can run older clients against newer servers and vice 
versa.  Longer term, we'd also like to permit clients in languages other 
than Java.  We intend Avro to provide a change-tolerant, cross-platform 
RPC solution.

We'd also like Hadoop to become more secure.  Currently Hadoop uses 
three different communications mechanisms: RPC, HTTP (for shuffle) and a 
raw socket-based protocol for HDFS data transfers.  It would be best not 
to have to re-implement security features for each of these.  So we hope 
that we can make Avro perform well enough to replace not only Hadoop's 
RPC, but also HTTP in the shuffle and the HDFS data transfer protocol.

If you're interested in discussing Avro further, I encourage you to join 
the Avro mailing lists.

http://hadoop.apache.org/avro/mailing_lists.html

Doug

Mime
View raw message