couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norman Barker <norman.bar...@gmail.com>
Subject Re: using gzip for db and view indexes
Date Fri, 18 Jun 2010 19:06:53 GMT
Adam,

I agree, as we grow our system we are probably going to want to
compression in some cases, I will look into this by making the changes
in couch_file as you suggest and report back.

Norman

On Fri, Jun 18, 2010 at 5:27 AM, Adam Kocoloski <kocolosk@apache.org> wrote:
> On Jun 17, 2010, at 6:00 PM, Norman Barker wrote:
>
>> Hi,
>>
>> I am looking at the couchdb db database and view index directory and I
>> see the files are saved as binary, my indexes and database are getting
>> fairly large so I tried gzipping them (by hand) and it made a big
>> difference (at least for my data).
>>
>> Looking at
>>
>> http://www.erlang.org/doc/man/file.html
>>
>> I see that compressed is an option when reading or writing a file, is
>> it worth trying this out, could it be an option in the ini file so we
>> could trade off database size versus a possible lag in access?
>>
>> I can do look into this, does everything go through the couch_file
>> module and is there a suitable test dataset that we can analyse
>> performance with?
>>
>> thanks,
>>
>> Norman
>
> Hi Norman, I'd support making gzip compression a config option.  Yes, everything goes
through couch_file, so adding a flag to the term_to_binary calls in append_term and append_term_md5
would get you there.
>
> You should search the archives for a discussion about this.  We used to compress the
terms, and IIRC it almost cut the file size in half.  However, it also introduced a measurable
drop in write throughput.  That's a tradeoff I'm sure some folks would be willing to make.
>
> One other interesting thing to investigate might be to have separate compression settings
for document bodies and btree nodes.  It could be that one compresses more effectively than
the other.  Best,
>
> Adam
>
>

Mime
View raw message