incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Compaction Strategies
Date Wed, 02 Mar 2011 22:14:16 GMT
On Mar 2, 2011, at 2:33 PM, Wayne Conrad wrote:

> We run a compaction script that compacts every database every night. Compaction of our
biggest (0.6 TB) database took about 10 hours today. Granted, the hardware has poor I/O bandwidth,
but even if we improve the hardware, a change in strategy could be good.  Along with splitting
that database into more manageable pieces, I hope to write a compaction script that only compacts
a database sometimes (a la Postgresql's autovacuum).  To do that, I want some way to estimate
whether there's anything to gain from compacting any given database.
> 
> I thought I could use the doc_del_count returned by GET /<database-name> as a gauge
of whether to compact or not, but in my tests doc_del_count remained the same after compaction.
 Are there any statistics, however imperfect, that could help my code guess when compaction
ought to be done?
> 
> Best Regards,
> Wayne Conrad

Hi Wayne, I don't think there's a satisfactory solution to this at the moment, which is why
I've been working with Bob Dionne to add some more detailed statistics to help inform that
kind of decision-making.  The idea is to add a new field to the response to GET /dbname (and
GET /db/_design/dname/_info) which will report the number of bytes allocated for storage of
"user data"; i.e. latest versions of document bodies and attachments in databases, KV pairs
and reductions in view indexes.  You could then write a script to trigger compaction if the
ratio of "data_size" / disk_size drops below a threshold.

Bob has a pull request in process for BigCouch; the changes he's making should apply to CouchDB
as well with a little tweaking.

By the way, you're right that doc_del_count does not change before and after compaction. The
document bodies are removed, but a small record is retained for the purposes of replication
and in-progress view index updates.  Regards,

Adam
Mime
View raw message