Return-Path: Delivered-To: apmail-pig-user-archive@www.apache.org Received: (qmail 253 invoked from network); 9 Dec 2010 08:20:35 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Dec 2010 08:20:35 -0000 Received: (qmail 47521 invoked by uid 500); 9 Dec 2010 08:20:35 -0000 Delivered-To: apmail-pig-user-archive@pig.apache.org Received: (qmail 46469 invoked by uid 500); 9 Dec 2010 08:20:32 -0000 Mailing-List: contact user-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@pig.apache.org Delivered-To: mailing list user@pig.apache.org Received: (qmail 45872 invoked by uid 99); 9 Dec 2010 08:20:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 08:20:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of guolin2001@gmail.com designates 209.85.216.49 as permitted sender) Received: from [209.85.216.49] (HELO mail-qw0-f49.google.com) (209.85.216.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 08:20:27 +0000 Received: by qwj9 with SMTP id 9so2460853qwj.22 for ; Thu, 09 Dec 2010 00:20:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=YQyLQis6AgdQEuXK8VaDApu++i7Bt5TcyFrPOuJES/Q=; b=dx/aZrlukqdG/ylMtSBZtBk/yMsqvwP79ppcruA2O6me7RbdxJR0wYhujmp8+YL2+J +k06SoLgUq55I7yI2LB8jyTLMaU00su6GpkREcgzRO9fNtHNBH8t8He3OplKIQkD+f2x veeb4pO4yYoO8xzX0KsFt+4cAWgY98ncrdHyw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=I1sE1DVxh6I2+TUPEeLBwgqp2wcdq07Eps/AF6aXl7NM6aFid86s2mZ59BXQGIVFyi ub+ME/VEZnCyWNCUIS9CoAMmWCnSx0VtuFc+0UVHN2/68ymOsEUFOgjHPbuc6DHZc8b6 QhXtzrbnE2HXzJ6OQWYIY3S/bA6hXtLU0QYls= MIME-Version: 1.0 Received: by 10.229.217.212 with SMTP id hn20mr3509399qcb.25.1291882806291; Thu, 09 Dec 2010 00:20:06 -0800 (PST) Received: by 10.229.185.134 with HTTP; Thu, 9 Dec 2010 00:20:06 -0800 (PST) In-Reply-To: References: Date: Thu, 9 Dec 2010 00:20:06 -0800 Message-ID: Subject: Re: comments appreciated for pig AvroStorage From: Lin Guo To: user@pig.apache.org Content-Type: text/plain; charset=ISO-8859-1 Hi, Jeff, We did some comparison of avro vs binary json (linkedin's serialization system, it uses a JSON data model but a more compact byte format; details in https://github.com/voldemort/voldemort/wiki/Binary-JSON-Serialization) before: 1. avro's in-memory serialization perf is 71% of binary json's; 2. avro's in-memory deserialization perf is 76% of binary json's; 3. on-disk serialization performance highly depends on compression algorithms; 4. when uncompressed, avro is more space efficient than binary json (I didn't do many experiments in this case and got ratio 62.5% using a couple sets of data). Best, Lin On Tue, Nov 30, 2010 at 9:42 PM, Jeff Zhang wrote: > Lin, > > Great work. So you've already use it in Linkedin ? And how about the > performance of AvroStorage compared to other Storage implementation ? > > On Wed, Dec 1, 2010 at 1:05 PM, Lin Guo wrote: >> Hi, >> >> We'd like to patch our pig AvroStorage function and >> would highly appreciate any kinds of comments. >> >> doc: >> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data >> >> jira: >> https://issues.apache.org/jira/browse/PIG-1748 >> >> Many thanks, >> Lin >> > > > > -- > Best Regards > > Jeff Zhang >