incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Bonser <mister...@gmail.com>
Subject Re: using gzip for db and view indexes
Date Sat, 19 Jun 2010 04:25:49 GMT
On Fri, Jun 18, 2010 at 6:27 AM, Adam Kocoloski <kocolosk@apache.org> wrote:
> On Jun 17, 2010, at 6:00 PM, Norman Barker wrote:
>
>> Hi,
>>
>> I am looking at the couchdb db database and view index directory and I
>> see the files are saved as binary, my indexes and database are getting
>> fairly large so I tried gzipping them (by hand) and it made a big
>> difference (at least for my data).
>>
>> Looking at
>>
>> http://www.erlang.org/doc/man/file.html
>>
>> I see that compressed is an option when reading or writing a file, is
>> it worth trying this out, could it be an option in the ini file so we
>> could trade off database size versus a possible lag in access?
>>
>> I can do look into this, does everything go through the couch_file
>> module and is there a suitable test dataset that we can analyse
>> performance with?
>>
>> thanks,
>>
>> Norman
>
> Hi Norman, I'd support making gzip compression a config option.  Yes, everything goes
through couch_file, so adding a flag to the term_to_binary calls in append_term and append_term_md5
would get you there.
>
> You should search the archives for a discussion about this.  We used to compress the
terms, and IIRC it almost cut the file size in half.  However, it also introduced a measurable
drop in write throughput.  That's a tradeoff I'm sure some folks would be willing to make.

With that much savings, but the trade-off of slower writes, it seems
like an ideal time to enable term_to_binary compression would be
during a compact. Of course, since compacts can already take a
significant amount of time, perhaps the best way for this would be to
add a compress_on_compact config option, so those who prefer speed
over all else could turn that option off. Probably another option for
overall compression for those who want to save on storage at the
expensive of speed.

>
> One other interesting thing to investigate might be to have separate compression settings
for document bodies and btree nodes.  It could be that one compresses more effectively than
the other.  Best,
>
> Adam
>
>



-- 
Paul Bonser
http://probablyprogramming.com

Mime
View raw message