Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 39641 invoked from network); 3 Apr 2009 22:45:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Apr 2009 22:45:31 -0000 Received: (qmail 96134 invoked by uid 500); 3 Apr 2009 22:45:31 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 96069 invoked by uid 500); 3 Apr 2009 22:45:30 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 96059 invoked by uid 99); 3 Apr 2009 22:45:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 22:45:30 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cutting@gmail.com designates 209.85.200.169 as permitted sender) Received: from [209.85.200.169] (HELO wf-out-1314.google.com) (209.85.200.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 22:45:21 +0000 Received: by wf-out-1314.google.com with SMTP id 23so1290931wfg.2 for ; Fri, 03 Apr 2009 15:45:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=ib2/8KzQBziNAPkQCTR/6OxJEJ2tuAfAtBHqIOUa/9U=; b=stmy7YR/6v7Rv9yHK0eT5OWPhdoaMsHCv5oSXH9YVlAseFBUU64D09A3vhH39y/w4P n4RS8zmKpyAOcMrww4u0dGLm7SxqgxF8h4+ONpNh2FZH5EUxfcr2JXANfghzViHi052X bvx2QrZwE3PAfCH7IjpQ2q3p7FTDIL1x532M0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; b=xesD/Z+zpov+46IHu6ghit5Fn2em7JeuFYVGlhTrT89us/xrtufs7b1BNiz//MVRXW 0Jwnr1ccZNXRM2ZaUMYB2t+QoooAolOE9rfg11ui1LyBSbp+2W/XKdhJ9zDniSlHu0em KjCYxIKVedkwQsYP5U0BBnpyHQ/x6OKYVZDT0= Received: by 10.142.186.9 with SMTP id j9mr457102wff.5.1238798700423; Fri, 03 Apr 2009 15:45:00 -0700 (PDT) Received: from ?192.168.168.16? (c-76-103-155-128.hsd1.ca.comcast.net [76.103.155.128]) by mx.google.com with ESMTPS id 24sm3569952wff.42.2009.04.03.15.44.58 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 03 Apr 2009 15:44:59 -0700 (PDT) Sender: Doug Cutting Message-ID: <49D69169.2070208@apache.org> Date: Fri, 03 Apr 2009 15:44:57 -0700 From: Doug Cutting User-Agent: Thunderbird 2.0.0.21 (X11/20090318) MIME-Version: 1.0 To: general@hadoop.apache.org Subject: Re: [PROPOSAL] new subproject: Avro References: <49D53694.1050906@apache.org> <3DBB0E23-7B6D-4541-A0D6-3687DF02C397@yahoo-inc.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Jim Kellerman (POWERSET) wrote: > It is also my understanding (based on the email thread) that Avro only > supports Java and python. That is a step backwards from Thrift. We intend to add support for more languages. Avro is not complete. > It appears that Avro uses introspection heavily, which is expensive in > applications that require a high message rate. It only uses introspection if you wish to use your existing Java classes to represent Avro data. There are three representations in Java: generic (uses Map for records, List for arrays), specific (generates a java class for each Avro record, like Thrift) and reflect (uses reflection to access existing classes). So introspection is optional. And, while introspection is indeed slow for processing file-based data, it would probably not a bottleneck for most RPC protocols and might be a useful tool to migrate existing code to Avro. > So I guess my question is why Avro? The compelling case is dynamic data types. Pig, Hive, Python, Perl etc. scripts should not have to generate a Thrift IDL file each time they wish to write a data file with a new schema, nor should they need to run the Thrift compiler for each data file they wish to read. For production applications, code-generation is not an imposition and may offer increased opportunities for optimization and error checking, but for exploration and experimentation, a very common use case for Hadoop, one would like to be able to browse datasets and build mapreduce programs more interactively. Doug