Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CE3710436 for ; Mon, 29 Apr 2013 22:55:14 +0000 (UTC) Received: (qmail 13715 invoked by uid 500); 29 Apr 2013 22:55:13 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 13639 invoked by uid 500); 29 Apr 2013 22:55:13 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 13631 invoked by uid 99); 29 Apr 2013 22:55:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Apr 2013 22:55:13 +0000 X-ASF-Spam-Status: No, hits=-3.7 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [72.21.196.25] (HELO smtp-fw-2101.amazon.com) (72.21.196.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Apr 2013 22:55:08 +0000 X-IronPort-AV: E=Sophos;i="4.87,576,1363132800"; d="scan'208";a="558309712" Received: from smtp-in-6002.iad6.amazon.com ([10.195.76.108]) by smtp-border-fw-out-2101.iad2.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 29 Apr 2013 22:54:02 +0000 Received: from ex10-hub-31005.ant.amazon.com (ex10-hub-31005.sea31.amazon.com [10.185.176.12]) by smtp-in-6002.iad6.amazon.com (8.13.8/8.13.8) with ESMTP id r3TMrp1M001101 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=OK) for ; Mon, 29 Apr 2013 22:54:02 GMT Received: from EX10-NODE-31001.ant.amazon.com ([fe80::255c:857f:62fa:adb3]) by ex10-hub-31005.ant.amazon.com ([::1]) with mapi id 14.02.0247.003; Mon, 29 Apr 2013 15:53:49 -0700 From: "Enns, Steven" To: "user@avro.apache.org" Subject: Re: map/reduce of compressed Avro Thread-Topic: map/reduce of compressed Avro Thread-Index: AQHORSuS3bhsTrg/B0Ol0kcMw9d2wJjtznoA Date: Mon, 29 Apr 2013 22:53:49 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.2.130206 x-originating-ip: [10.184.49.66] Content-Type: text/plain; charset="us-ascii" Content-ID: <30CBA3960E114F48AA146A471B1E864C@ant.amazon.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Out of curiosity, are there any other file formats that provide splittable gzip compression like Avro object containers? I can only think of Sequence Files. On 4/29/13 3:47 PM, "Scott Carey" wrote: >Martin said it already, but I will emphasize: > >Avro data files are splittable and can support multiple mappers no matter >what codec is used for compression. This is because avro files are block >based, and only use the compression within the block. I recommend >starting with gzip compression, and moving to snappy only if deflate >compression level '1' is not fast enough. > >For more information on avro data files, see: >http://avro.apache.org/docs/current/spec.html#Object+Container+Files > > > >On 4/22/13 11:47 PM, "nir_zamir" wrote: > >>Thanks Martin. >> >>What will happen if I try to use an indexed LZO-compressed avro file? >>Will >>it work and utilize the index to allow multiple mappers? >> >>I think that for Snappy for example, the file is splittable and can use >>multiple mappers, but I haven't tested it yet - would be glad if anyone >>has >>any experience with that. >> >>Thanks! >>Nir. >> >> >> >>-- >>View this message in context: >>http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp4 >>0 >>26947p4027009.html >>Sent from the Avro - Users mailing list archive at Nabble.com. > >