couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Rakotojaona <>
Subject Re: Compaction
Date Sat, 10 Mar 2012 19:58:43 GMT
On Sat, Mar 10, 2012 at 8:25 PM, Paul Davis <> wrote:
> On Sat, Mar 10, 2012 at 1:01 PM, Matthieu Rakotojaona
> <> wrote:
>> Hello,
>> Wow, thank you for the very comprehensive answer.
>> On Thu, Mar 8, 2012 at 10:39 PM, Paul Davis <> wrote:
>>>> And the initial purpose to my mail comes here. I just added a few
>>>> documents in my db (1.7+M) and found that the disk_size gives me ~2.5GB.
>>>> while the data_size is around 660 Mo. From what I read, a compaction is
>>>> supposed to leave you with data_size ~= disk_size; yet, after numerous
>>>> compaction, it doesn't shrink a bit.
>>> I bet you have random document ids which will indeed cause the
>>> database file to end up with a significant amount of garbage left
>>> after compaction. I'll describe why below.
>> Yup. I already had my ids, but they were not ordered as I read through
>> the file. Now that couchDB stores my rows with its own-generated IDs
>> (with the 'sequential' algorithm), the new size of my whole DB shrank
>> down to 500 MB. Very neat.
>>>> * If yes, can you move the temporary db.compact.couch file somewhere
>>>>        else and link to it so that couchdb thinks nothing has changed
>>> I'm not sure what you mean here.
>> In case I see that I will lack storage space, like what happened to
>> me, I would like the .compact file to be created and used in another
>> disk, but I didn't see this in the config file. So I thought something
>> like that would do the trick :
>> 1. Launch compaction
>> 2. Pause it (actually, stop the server for now)
>> 3. Move the .compact created file somewhere else, and symlink to it
>> 4. Continue compaction
>> This flow could also be useful if we want to use an SSD to do a
>> (faster) compaction, later writing the DB back to a classic HDD.
>> I resorted to mounting some directory on my data disk to
>> /var/lib/couchdb, which I'm not really proud of.
>> --
> On one hand this makes a lot of sense, on the other though it might
> cause a bit of an issue for people. If we allow people to specify a
> different directory that ends up on a different disk then the atomic
> rename that we rely on becomes a possibly quite length copy between
> two devices. Since this swap is serialized in the couch_db_updater
> code it would render a database unresponsive to any traffic during
> that possibly lengthy copy. Its possible that we could have a two step
> process but that'd would require a bit more trickery and I'm not sure
> it'd be worth it in the general case.

Ok, I see what you mean. This kind of modification would be useful for
admins, as they speed up the whole process, but poses the risk of an
unavailability moment for the DB users.

Thanks for the explanations !


View raw message