incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: using gzip for db and view indexes
Date Fri, 18 Jun 2010 11:27:20 GMT
On Jun 17, 2010, at 6:00 PM, Norman Barker wrote:

> Hi,
> 
> I am looking at the couchdb db database and view index directory and I
> see the files are saved as binary, my indexes and database are getting
> fairly large so I tried gzipping them (by hand) and it made a big
> difference (at least for my data).
> 
> Looking at
> 
> http://www.erlang.org/doc/man/file.html
> 
> I see that compressed is an option when reading or writing a file, is
> it worth trying this out, could it be an option in the ini file so we
> could trade off database size versus a possible lag in access?
> 
> I can do look into this, does everything go through the couch_file
> module and is there a suitable test dataset that we can analyse
> performance with?
> 
> thanks,
> 
> Norman

Hi Norman, I'd support making gzip compression a config option.  Yes, everything goes through
couch_file, so adding a flag to the term_to_binary calls in append_term and append_term_md5
would get you there.

You should search the archives for a discussion about this.  We used to compress the terms,
and IIRC it almost cut the file size in half.  However, it also introduced a measurable drop
in write throughput.  That's a tradeoff I'm sure some folks would be willing to make.

One other interesting thing to investigate might be to have separate compression settings
for document bodies and btree nodes.  It could be that one compresses more effectively than
the other.  Best,

Adam


Mime
View raw message