hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [PROPOSAL] new subproject: Avro
Date Fri, 03 Apr 2009 22:44:57 GMT
Jim Kellerman (POWERSET) wrote:
> It is also my understanding (based on the email thread) that Avro only
> supports Java and python. That is a step backwards from Thrift.

We intend to add support for more languages.  Avro is not complete.

> It appears that Avro uses introspection heavily, which is expensive in
> applications that require a high message rate.

It only uses introspection if you wish to use your existing Java classes 
to represent Avro data.  There are three representations in Java: 
generic (uses Map<String,Object> for records, List<Object> for arrays), 
specific (generates a java class for each Avro record, like Thrift) and 
reflect (uses reflection to access existing classes).  So introspection 
is optional.  And, while introspection is indeed slow for processing 
file-based data, it would probably not a bottleneck for most RPC 
protocols and might be a useful tool to migrate existing code to Avro.

> So I guess my question is why Avro?

The compelling case is dynamic data types.  Pig, Hive, Python, Perl etc. 
scripts should not have to generate a Thrift IDL file each time they 
wish to write a data file with a new schema, nor should they need to run 
the Thrift compiler for each data file they wish to read.  For 
production applications, code-generation is not an imposition and may 
offer increased opportunities for optimization and error checking, but 
for exploration and experimentation, a very common use case for Hadoop, 
one would like to be able to browse datasets and build mapreduce 
programs more interactively.

Doug

Mime
View raw message