Jim Kellerman (POWERSET) wrote:
> It is also my understanding (based on the email thread) that Avro only
> supports Java and python. That is a step backwards from Thrift.
We intend to add support for more languages. Avro is not complete.
> It appears that Avro uses introspection heavily, which is expensive in
> applications that require a high message rate.
It only uses introspection if you wish to use your existing Java classes
to represent Avro data. There are three representations in Java:
generic (uses Map<String,Object> for records, List<Object> for arrays),
specific (generates a java class for each Avro record, like Thrift) and
reflect (uses reflection to access existing classes). So introspection
is optional. And, while introspection is indeed slow for processing
file-based data, it would probably not a bottleneck for most RPC
protocols and might be a useful tool to migrate existing code to Avro.
> So I guess my question is why Avro?
The compelling case is dynamic data types. Pig, Hive, Python, Perl etc.
scripts should not have to generate a Thrift IDL file each time they
wish to write a data file with a new schema, nor should they need to run
the Thrift compiler for each data file they wish to read. For
production applications, code-generation is not an imposition and may
offer increased opportunities for optimization and error checking, but
for exploration and experimentation, a very common use case for Hadoop,
one would like to be able to browse datasets and build mapreduce
programs more interactively.
Doug
|