couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Compaction
Date Thu, 08 Mar 2012 21:39:53 GMT
On Thu, Mar 8, 2012 at 1:21 PM, Matthieu Rakotojaona
<> wrote:
> Hello everyone,
> I discovered couchDB a few months ago, and decided to dive in just
> recently. I don't want to be long, but couchDB is Amazing. True offline
> mode/replication, JSON over HTTP, MVCC, MapReduce and other concepts
> widened my horizon of how to solve a problem, and I'm really grateful.
> There is a point though that I find sad : the documentation available on
> the interwebs are somewhat scarce. Sure you can find yourself because
> couchDB is so easy, but there's a particular point that I found to be
> especially undocumented : compaction.
> Basically all I could find was :
> * If you want to compact your db :
>        > POST /db/_compact
> * If you want to compact your design :
>        > POST /db/_compact/designname
>        (which seems to say that you can only compact all your views at once
>        or none, but not a particular one)

Slightly more specific: Compaction for views is done for all the views
in the specified design document. Also, view compaction is in general
much more efficient than database compaction.

> * Although specially designed like that, the absence of automatic
>  compaction is seen as unneeded, and a number of people run it with
>  cron jobs

There's an auto compactor in trunk now.

> * The real effect of a compaction (ie the real size you are going to
>        earn) seems to be unknown by many people. Someone (I don't remember
>        your name, but thank you) came with a patch to dispaly the data_size,
>        which is the real size of your data on disk; this looks hackish.

Which part looks hackish?

> And the initial purpose to my mail comes here. I just added a few
> documents in my db (1.7+M) and found that the disk_size gives me ~2.5GB.
> while the data_size is around 660 Mo. From what I read, a compaction is
> supposed to leave you with data_size ~= disk_size; yet, after numerous
> compaction, it doesn't shrink a bit.

I bet you have random document ids which will indeed cause the
database file to end up with a significant amount of garbage left
after compaction. I'll describe why below.

> I suppose the problem is exactly the same with views; I'm building it at
> the moment, so I will test it later.

Technically yes, but in general no. More below.

> I also would like to understand the process of compaction. All I could
> see was :
> 1. couchdb parses the entire DB, fetching only the last (or the few
>         last, from parameters) revision of each document
> 2. it assembles them in a db.compact.couch file
> 3. when finished, db.compact.couch replaces db.couch

In broad strokes. Currently, CouchDB compacts like such:

1. Iterate over docs in order of the update_sequence
2. Read document from the id_btree
3. Write doc to both the update sequence and id indexes in the compaction file
4. When finished, delete the .couch file and rename .couch.compact -> .couch

Its a bit more complicated than that due to buffering of docs to
improve throughput and what not, but those are the important details.

The issue is two fold. First, reading the docs in order of the update
sequence and then fetching them using the id btree means we're
incurring a btree lookup per doc. There's a patch in BigCouch that
addresses this by duplicating a record in both trees. It's been shown
to have significant speedups for compaction and replication both at
the expensive of storing more data (basically it has two copies of the
revision tree, but importantly does not duplicate the actual JSON body
of the document). While not directly size related in itself, it leads
us to the second issue.

Namely, that writing both indexes simultaneously is bad for
introducing garbage into the .compact file if the order of document
ids in the update_seq is random. Ie, if you wrote the same documents
to a database where one had is that were monotonically increasing,
(say, "%0.20d" % i) vs a random document id and then compact both, the
random ids will use significantly more disk space after compaction (as
well as take longer to compact).

The issue here is that when we update the id tree with random doc ids
we end up rewriting more of the internal nodes (append only storage)
which causes more garbage to accumulate. Although, all hope is not

There's a second set of two patches in BigCouch that I wrote to
address this specifically. The first patch changes the compactor to
use a temporary file for the id btree. Then just before compaction
finishes, this tree is streamed back into the .compact file (in sorted
order so that internal garbage is minimized). This helps tremendously
for databases with random document ids (sorted ids are already
~optimal for this scheme). The second patch in the set uses an
external merge sort on the temporary file which helps speed up the

Depending on the dataset these improvements can have massive gains for
post-compaction data sizes as well as time required for compaction. I
plan on pulling these back into CouchDB in the coming months as we
work on merging BigCouch back into CouchDB so hopefully by end of
summer they'll be in master for everyone to enjoy.

As to views, they don't really require these imrpovements because
their indexes are always streamed in sorted order. So its both fast
and close-ish to optimal. Although somewhere I had a patch that
changed the index builds to be actually optimal based on ideas from
Filipe but as I recall it wasn't a super huge win so I didn't actually
commit it.

> So I wondered :
> * Can you launch a compaction, halt it and continue it later ?

While you can resume compaction, there's no API for pausing or
canceling them. There's actually a really neat way in Erlang to do
this that we've mentioned occasionally adding to the active tasks API
but no one has gotten around to adding it.

> * If yes, can you move the temporary db.compact.couch file somewhere
>        else and link to it so that couchdb thinks nothing has changed ?

I'm not sure what you mean here.

> Thank you,
> --

View raw message