Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 693A710989 for ; Thu, 22 Aug 2013 22:04:18 +0000 (UTC) Received: (qmail 36724 invoked by uid 500); 22 Aug 2013 22:04:18 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 36663 invoked by uid 500); 22 Aug 2013 22:04:17 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 36648 invoked by uid 99); 22 Aug 2013 22:04:17 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Aug 2013 22:04:17 +0000 Received: from localhost (HELO [10.0.1.55]) (127.0.0.1) (smtp-auth username scottcarey, mechanism login) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Aug 2013 22:04:17 +0000 User-Agent: Microsoft-MacOutlook/14.3.6.130613 Date: Thu, 22 Aug 2013 14:35:08 -0700 Subject: Re: Avro file Compression From: Scott Carey Sender: Scott Carey To: "user@avro.apache.org" Message-ID: Thread-Topic: Avro file Compression In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3460028656_41589319" > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3460028656_41589319 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable The file format compresses in blocks, and the block size is configurable. This will compress across objects in a block, so it works for small objects as well as large ones =8B as long as the total block size is large enough. I have found that I can increase the ratio of compression by ordering the objects carefully so that neighbor records have more in common. From: Bill Baird Reply-To: "user@avro.apache.org" Date: Thursday, August 22, 2013 7:47 AM To: "user@avro.apache.org" Subject: Re: Avro file Compression As with any compression, how much you get depends on the size and nature of the data. I have objects where unserialized they take 4 or 5k, and they serialize to 1.5 to 3k, or about 2 to 1. However, for the same object structure (which contains several nested arrays ... lots of strings, number= s ... basic business data) when uncompressed it 17MB, it deflates to 1MB (or 17 to 1). For very small objects, deflate will actually produce a larger output, but it does quite well as the size of the data being deflated grows= . Bill On Wed, Aug 21, 2013 at 11:31 PM, Harsh J wrote: > Can you share your test? There is an example at > http://svn.apache.org/repos/asf/avro/trunk/lang/c/examples/quickstop.c > which has the right calls for using a file writer with a deflate codec > - is yours similar? >=20 > On Mon, Aug 19, 2013 at 9:42 PM, amit nanda wrote: >> > I am try to compress the avro files that i am writing, for that i am u= sing >> > the latest Avro C, with "deflate" option, but i am not able to see any >> > difference in the file size. >> > >> > Is there any special type to data that this works on, or is there any = more >> > setting that needs to be done for this to work. >> > >> > >=20 >=20 >=20 > -- > Harsh J --B_3460028656_41589319 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable
The file format compresses i= n blocks, and the block size is configurable.  This will compress acros= s objects in a block, so it works for small objects as well as large ones &#= 8212; as long as the total block size is large enough.  

=
I have found that I can increase the ratio of compression by orde= ring the objects carefully so that neighbor records have more in common.

From: Bill Baird <bill.baird@traxtech.com>
Reply-To: "user@avro.apache.org" <user@avro.apache.org>
Date: Thursday, August 22, 2013 7:47 AM
To: <= /span> "user@avro.apache.org" <= user@avro.apache.org>
Subject: Re: Avro file Compression

As with any compression, how much you get de= pends on the size and nature of the data.  I have objects where unseria= lized they take 4 or 5k, and they serialize to 1.5 to 3k, or about 2 to 1. &= nbsp;However, for the same object structure (which contains several nested arrays ... lots of strings, numbers ... basic business data= ) when uncompressed it 17MB, it deflates to 1MB (or 17 to 1).  For very= small objects, deflate will actually produce a larger output, but it does q= uite well as the size of the data being deflated grows.

Bill


On Wed, Aug 21, 2013 at 11:31 PM, Harsh J <harsh@cloudera.com<= /a>> wrote:
Can you share your test? There is an example at
htt= p://svn.apache.org/repos/asf/avro/trunk/lang/c/examples/quickstop.c
which has the right calls for using a file writer with a deflate codec
- is yours similar?

On Mon, Aug 19, 2013 at 9:42 PM, amit nanda <amitwip@gmail.com> wrote:
> I am try to compress the avro files that i am writing, for that i am u= sing
> the latest Avro C, with "deflate" option, but i am not able to see any=
> difference in the file size.
>
> Is there any special type to data that this works on, or is there any = more
> setting that needs to be done for this to work.
>
>



= --
Harsh J

--B_3460028656_41589319--