incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <>
Subject Re: Why do my CouchDB databases grow so fast?
Date Mon, 31 May 2010 23:41:18 GMT
Also, if you were to adjust the scale I believe the two graphs would
have a similar slope for the first several updates.

On Mon, May 31, 2010 at 16:39, Randall Leeds <> wrote:
> Hi Konrad,
> I'll take a stab at this and if I'm wrong hopefully someone will correct me.
> The on disk BTree is written in an append only fashion rather than
> modified in place. Append only updates mean that every inner node of
> the BTree along the path from the root to the new update has to be
> re-written each time. Initially, when there are very few inner nodes,
> the amount of disk space used for each new update is relatively
> constant. Since the tree has a large fan-out the depth does not change
> much at first. In the second graph you are seeing a tree that has a
> depth of 1 (just the root) being written over an over again to disk
> and the corresponding expected linear growth results. However, when
> you have a higher revision limit the old revisions are kept in the
> tree and the tree grows taller and fatter with each update. As you
> make more updates more inner nodes need to be rewritten for each
> update which causes the growth to accelerate. Eventually, you hit the
> revision limit and old revisions are discarded, the tree stops getting
> any taller or fatter and the number of inner nodes that need to be
> changed for each update remains relatively constant (but greater than
> in the case of rev_limit=1). I suspect that the first graph becomes
> linear above 1000 updates and does not continue to accelerate.
> Cheers,
> Randall
> 2010/5/31 Konrad Förstner <>:
>> Hi,
>> I have an issue with CouchDB and posted the question on stackoverflow
>> [1] but did not get any helpful answer. I would be great if somebody
>> could answer this here or a stackoverflow (there I also had a problem
>> with the compaction which was just a timing issue as explaint in the
>> comment)
>> I was wondering why my CouchDB database was growing to fast so I wrote
>> a little test script [2]. This script changes an attributed of a CouchDB
>> document 1200 times and takes the size of the database after each
>> change. After performing these 1200 writing steps the database is
>> doing a compaction step and the db size is measured again. In the end
>> the script plots the databases size against the revision numbers. The
>> benchmarking is run twice:
>> * The first time the default number of document revision (=1000) is used (_revs_limit).
>> * The second time the number of document revisions is set to 1.
>> The first run produces the following plot
>> The second run produces this plot second run
>> For me this is quite an unexpected behavior. In the first run I would
>> have expected a linear growth as every change produces a new
>> revision. When the 1000 revisions are reached the size value should be
>> constant as the older revisions are discarded.
>> In the second run the first revision should result in certain database
>> size that is then keeps during the following writing steps as every
>> new revision leads to the deletion of the previous one.
>> I could understand if there is a little bit of overhead needed to
>> manage the changes but this growth behavior seems weird to me. Can
>> anybody explain this phenomenon or correct my assumptions that lead to
>> the wrong expectations?
>> Many thanks in advance
>> Konrad
>> [1]
>> [2]

View raw message