Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of randall.leeds@gmail.com
 designates 209.85.161.52 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=qzsCBm70XM1xsGQ3gg0J8WE9HJ4mBebmMpcJFynE/hptZPJB2jX5G3WzJmJwwvdl3u
         ONKeC5/ykt1VxpmhfRERax4Row4N3EJtktfJuuSwHxWovqrb8Ar5i83ljklEC7RWvmxu
         s+En4QMIdAtf4P+1cNl33PM+FiZp6dIOTeii4=
MIME-Version: 1.0
In-Reply-To: <4D2A1E1E.4090500@bclary.com>
References: <AANLkTin4oRp279JhJkUe2rk-sCbY=rC3H_rCQ60kjua_@mail.gmail.com>
	<AANLkTi=qSaHYpnu+Q6ZQCn8O+FxmN=bnjKtmFCC6fyX7@mail.gmail.com>
	<4D2A1E1E.4090500@bclary.com>
Date: Sun, 9 Jan 2011 14:05:31 -0800
Message-ID: <AANLkTinZmU5VKn9F1SiGe_mcRRkeAGHNuh_UQDevR9dk@mail.gmail.com>
Subject: Re: operational file size
From: Randall Leeds <randall.leeds@gmail.com>
To: user@couchdb.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Sun, Jan 9, 2011 at 12:44, Bob Clary <bob@bclary.com> wrote:
> Jeffrey,
>
> Randal makes several good points and covers many of the issues you will n=
eed
> to handle however I'd like to chime in with some the lessons I have learn=
ed
> from my experiences.
>
> The estimate that your maximum database size should be less than 1/2 of y=
our
> free disk space is a good starting point but you need to also consider th=
e
> disk space consumed by your views. They also will require a maximum of tw=
ice
> their size to compact. If your view sizes are on the same order as your
> database size, then you can expect your maximum database size to be 1/4 o=
f
> your free disk space. This doesn't take into account the current issue in
> CouchDB where some initial view sizes may be 10-20 times of their final
> compacted size.
>
> Regularly compacting your database *and* views is critical to limiting yo=
ur
> maximum disk usage. Until the issue where compaction leaves file handles
> open for deleted old copies of files is resolved you will also need to
> periodically restart your CouchDB server in order to free the space from =
the
> old versions of the files. Monitoring not only the database and view size=
s
> but also the actual free space reported by the system is important. If yo=
u
> see the free space continuing to decrease to a dangerous level after
> repeated compactions you need to restart the database or risk running out=
 of
> space on the entire machine.
>

The issue you refer to is here[1] and it's been fixed for the upcoming
1.0.2 and 1.1 releases.

> The replication strategy to bigger machines will work up to a point (see
> below) as long as the load on your database is not too great and the
> database and views do not need to be compacted too often. However
> replicating a large database with millions of documents will take a long
> time and you may not have sufficient time to move to a larger machine bef=
ore
> you run out of space if the database and views need to be compacted sever=
al
> times during the replication.
>
> Finally, once your database views grow large enough you will run into the
> issue where CouchDB will crash after compacting your views, resulting in =
the
> view being deleted and having to be recreated from the beginning. This vi=
ew
> creation-compaction-crash-creation cycle can take more than a day with a
> large database, will leave any parts of your application which depend on
> these views unusable and won't be resolved through replication to a machi=
ne
> with a larger disk.
>

That's a more disturbing issue and it looks like no one's addressed it
yet. I'll comment on the JIRA ticket and see if we can get some
movement on it. I know it hasn't been around forever, since older
releases did not exhibit this behavior. I bet we can track it down.

> In summary I think the initial free disk space should be 4 times the
> expected size of your database and, depending on your views, that there i=
s
> currently an absolute limit beyond which CouchDB will become unusable. In=
 my
> case it was a compacted database of 40G of about 10 million documents.
>
> bc
>
> On 1/8/11 12:31 PM, Randall Leeds wrote:
>>
>> It's hard to estimate how big the compacted database will be given the
>> size of the original. In the worst case (when your database is already
>> compacted), compacting it again will double your usage, since it
>> creates a whole new, optimized copy of the database file.
>>
>> More likely is that the original is not compact and so the new file
>> will be much smaller.
>>
>> Clearly, then, the answer is that if you want to be ultra safe no
>> single database should exceed 50% of your capacity. However, it is
>> safe to have many small databases such that the total disk consumption
>> is much higher.
>>
>> The best solution is to regularly compact your databases and track the
>> usage and size differences so you get a good sense of how fast you're
>> growing. And remember, if you find yourself in a sticky situation
>> where you can't compact you probably still have plenty of time to
>> replicate to a bigger machine or a hosted cluster such as offered by
>> Cloudant. Good monitoring is the best way to avoid disaster.
>>
>> On Sat, Jan 8, 2011 at 10:39, Jeffrey M. Barber<zengeneral@gmail.com>
>> =C2=A0wrote:
>>>
>>> If I'm running CouchDB with 100GB of disk space, what is the maximum
>>> CouchDB
>>> database size such that I'm still able to optimize?
>>>
>>> I remember running out of room on a rackspace machine, and I got the
>>> strangest of error codes when trying to run CouchDB.
>>>
>>> -J
>>>
>>
>
>

[1]https://issues.apache.org/jira/browse/COUCHDB-926