Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of cutting@gmail.com designates
 209.85.200.169 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=sender:message-id:date:from:user-agent:mime-version:to:subject
         :references:in-reply-to:content-type:content-transfer-encoding;
        b=xesD/Z+zpov+46IHu6ghit5Fn2em7JeuFYVGlhTrT89us/xrtufs7b1BNiz//MVRXW
         0Jwnr1ccZNXRM2ZaUMYB2t+QoooAolOE9rfg11ui1LyBSbp+2W/XKdhJ9zDniSlHu0em
         KjCYxIKVedkwQsYP5U0BBnpyHQ/x6OKYVZDT0=
Sender: Doug Cutting <cutting@gmail.com>
Message-ID: <49D69169.2070208@apache.org>
Date: Fri, 03 Apr 2009 15:44:57 -0700
From: Doug Cutting <cutting@apache.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090318)
MIME-Version: 1.0
To: general@hadoop.apache.org
Subject: Re: [PROPOSAL] new subproject: Avro
References: <49D53694.1050906@apache.org>
 <3DBB0E23-7B6D-4541-A0D6-3687DF02C397@yahoo-inc.com>
 <EAFDEC03CDA5D644878904F4A8F0158A6191AC88EA@NA-EXMSG-C103.redmond.corp.microsoft.com>
In-Reply-To: 
 <EAFDEC03CDA5D644878904F4A8F0158A6191AC88EA@NA-EXMSG-C103.redmond.corp.microsoft.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Jim Kellerman (POWERSET) wrote:
> It is also my understanding (based on the email thread) that Avro only
> supports Java and python. That is a step backwards from Thrift.

We intend to add support for more languages.  Avro is not complete.

> It appears that Avro uses introspection heavily, which is expensive in
> applications that require a high message rate.

It only uses introspection if you wish to use your existing Java classes 
to represent Avro data.  There are three representations in Java: 
generic (uses Map<String,Object> for records, List<Object> for arrays), 
specific (generates a java class for each Avro record, like Thrift) and 
reflect (uses reflection to access existing classes).  So introspection 
is optional.  And, while introspection is indeed slow for processing 
file-based data, it would probably not a bottleneck for most RPC 
protocols and might be a useful tool to migrate existing code to Avro.

> So I guess my question is why Avro?

The compelling case is dynamic data types.  Pig, Hive, Python, Perl etc. 
scripts should not have to generate a Thrift IDL file each time they 
wish to write a data file with a new schema, nor should they need to run 
the Thrift compiler for each data file they wish to read.  For 
production applications, code-generation is not an imposition and may 
offer increased opportunities for optimization and error checking, but 
for exploration and experimentation, a very common use case for Hadoop, 
one would like to be able to browse datasets and build mapreduce 
programs more interactively.

Doug