Return-Path: Delivered-To: apmail-avro-user-archive@www.apache.org Received: (qmail 35493 invoked from network); 31 Mar 2011 01:48:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2011 01:48:46 -0000 Received: (qmail 98516 invoked by uid 500); 31 Mar 2011 01:48:46 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 98472 invoked by uid 500); 31 Mar 2011 01:48:46 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 98464 invoked by uid 99); 31 Mar 2011 01:48:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 01:48:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of scott@richrelevance.com designates 64.78.17.16 as permitted sender) Received: from [64.78.17.16] (HELO EXHUB018-1.exch018.msoutlookonline.net) (64.78.17.16) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 01:48:38 +0000 Received: from EXVMBX018-1.exch018.msoutlookonline.net ([64.78.17.47]) by EXHUB018-1.exch018.msoutlookonline.net ([64.78.17.16]) with mapi; Wed, 30 Mar 2011 18:48:18 -0700 From: Scott Carey To: "user@avro.apache.org" Date: Wed, 30 Mar 2011 18:51:10 -0700 Subject: Re: Avro speed comparison with raw logs Thread-Topic: Avro speed comparison with raw logs Thread-Index: AcvvRbS8ad3KkSpoQR27s7ldi3dpIA== Message-ID: In-Reply-To: <4D712095.1090007@apache.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.0.101115 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org gzip/deflate is approximately the same speed to decompress for all compression levels. However, for compression, it varies by a factor of 5 or so between the fastest (1) and slowest (9). This is a useful link for gzip performance characteristics: http://tukaani.org/lzma/benchmarks.html On 3/4/11 9:25 AM, "Doug Cutting" wrote: >On 03/01/2011 09:05 PM, felix gao wrote: >> I am running some comparison tests on a data set that I converted to >> avro with deflator set to level 6. The original logs consists of 2880 >> uncompressed http access logs with a total size of 1.4TB. The Compressed >> avro log is about 2/3 of the size. However, when I ran the same pig job >> on the raw logs, it is blazing fast during the initial map phase. >> Finished in under 40 min. When I ran the same pig job with avro files, >> the initial map phase took 8 minutes to only finish 10%. I am wondering >> is there any way to figure out what is slowing down the map? > >What version of Avro are you using? How are you integrating Avro with >Pig? > >Also, for speed, you might try level=3D1 (Deflater.BEST_SPEED). > >Doug