incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Clary <...@bclary.com>
Subject Re: operational file size
Date Sun, 09 Jan 2011 20:44:14 GMT
Jeffrey,

Randal makes several good points and covers many of the issues you will 
need to handle however I'd like to chime in with some the lessons I have 
learned from my experiences.

The estimate that your maximum database size should be less than 1/2 of 
your free disk space is a good starting point but you need to also 
consider the disk space consumed by your views. They also will require a 
maximum of twice their size to compact. If your view sizes are on the 
same order as your database size, then you can expect your maximum 
database size to be 1/4 of your free disk space. This doesn't take into 
account the current issue in CouchDB where some initial view sizes may 
be 10-20 times of their final compacted size.

Regularly compacting your database *and* views is critical to limiting 
your maximum disk usage. Until the issue where compaction leaves file 
handles open for deleted old copies of files is resolved you will also 
need to periodically restart your CouchDB server in order to free the 
space from the old versions of the files. Monitoring not only the 
database and view sizes but also the actual free space reported by the 
system is important. If you see the free space continuing to decrease to 
a dangerous level after repeated compactions you need to restart the 
database or risk running out of space on the entire machine.

The replication strategy to bigger machines will work up to a point (see 
below) as long as the load on your database is not too great and the 
database and views do not need to be compacted too often. However 
replicating a large database with millions of documents will take a long 
time and you may not have sufficient time to move to a larger machine 
before you run out of space if the database and views need to be 
compacted several times during the replication.

Finally, once your database views grow large enough you will run into 
the issue where CouchDB will crash after compacting your views, 
resulting in the view being deleted and having to be recreated from the 
beginning. This view creation-compaction-crash-creation cycle can take 
more than a day with a large database, will leave any parts of your 
application which depend on these views unusable and won't be resolved 
through replication to a machine with a larger disk.

In summary I think the initial free disk space should be 4 times the 
expected size of your database and, depending on your views, that there 
is currently an absolute limit beyond which CouchDB will become 
unusable. In my case it was a compacted database of 40G of about 10 
million documents.

bc

On 1/8/11 12:31 PM, Randall Leeds wrote:
> It's hard to estimate how big the compacted database will be given the
> size of the original. In the worst case (when your database is already
> compacted), compacting it again will double your usage, since it
> creates a whole new, optimized copy of the database file.
>
> More likely is that the original is not compact and so the new file
> will be much smaller.
>
> Clearly, then, the answer is that if you want to be ultra safe no
> single database should exceed 50% of your capacity. However, it is
> safe to have many small databases such that the total disk consumption
> is much higher.
>
> The best solution is to regularly compact your databases and track the
> usage and size differences so you get a good sense of how fast you're
> growing. And remember, if you find yourself in a sticky situation
> where you can't compact you probably still have plenty of time to
> replicate to a bigger machine or a hosted cluster such as offered by
> Cloudant. Good monitoring is the best way to avoid disaster.
>
> On Sat, Jan 8, 2011 at 10:39, Jeffrey M. Barber<zengeneral@gmail.com>  wrote:
>> If I'm running CouchDB with 100GB of disk space, what is the maximum CouchDB
>> database size such that I'm still able to optimize?
>>
>> I remember running out of room on a rackspace machine, and I got the
>> strangest of error codes when trying to run CouchDB.
>>
>> -J
>>
>


Mime
View raw message