couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Compaction
Date Sat, 10 Mar 2012 19:25:51 GMT
On Sat, Mar 10, 2012 at 1:01 PM, Matthieu Rakotojaona
<> wrote:
> Hello,
> Wow, thank you for the very comprehensive answer.
> On Thu, Mar 8, 2012 at 10:39 PM, Paul Davis <> wrote:
>>> And the initial purpose to my mail comes here. I just added a few
>>> documents in my db (1.7+M) and found that the disk_size gives me ~2.5GB.
>>> while the data_size is around 660 Mo. From what I read, a compaction is
>>> supposed to leave you with data_size ~= disk_size; yet, after numerous
>>> compaction, it doesn't shrink a bit.
>> I bet you have random document ids which will indeed cause the
>> database file to end up with a significant amount of garbage left
>> after compaction. I'll describe why below.
> Yup. I already had my ids, but they were not ordered as I read through
> the file. Now that couchDB stores my rows with its own-generated IDs
> (with the 'sequential' algorithm), the new size of my whole DB shrank
> down to 500 MB. Very neat.
>>> * If yes, can you move the temporary db.compact.couch file somewhere
>>>        else and link to it so that couchdb thinks nothing has changed ?
>> I'm not sure what you mean here.
> In case I see that I will lack storage space, like what happened to
> me, I would like the .compact file to be created and used in another
> disk, but I didn't see this in the config file. So I thought something
> like that would do the trick :
> 1. Launch compaction
> 2. Pause it (actually, stop the server for now)
> 3. Move the .compact created file somewhere else, and symlink to it
> 4. Continue compaction
> This flow could also be useful if we want to use an SSD to do a
> (faster) compaction, later writing the DB back to a classic HDD.
> I resorted to mounting some directory on my data disk to
> /var/lib/couchdb, which I'm not really proud of.
> --

On one hand this makes a lot of sense, on the other though it might
cause a bit of an issue for people. If we allow people to specify a
different directory that ends up on a different disk then the atomic
rename that we rely on becomes a possibly quite length copy between
two devices. Since this swap is serialized in the couch_db_updater
code it would render a database unresponsive to any traffic during
that possibly lengthy copy. Its possible that we could have a two step
process but that'd would require a bit more trickery and I'm not sure
it'd be worth it in the general case.

View raw message