Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of randall.leeds@gmail.com
 designates 209.85.161.52 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=or6SRwkj6pOpQ02tYXvMda/QiM0QBUc/Mas/IFTJsjTXWaXl2Z5tM1Wmp4m7DNno9T
         tYyYgeF9r8zpXrZEa0EElzkv5KEubQBRRO0cLya7AmBokE+pccCy9aFkb3o/1KpD1bJw
         +iUsehAgjx4A6LjQDvYmR2gAdtPvE+2Eg0M6M=
MIME-Version: 1.0
In-Reply-To: 
 <27B956916AFFD64CBC38E83259F6AEFD0A248BA5@RDU-CAEXCH-02.channeladvisor.com>
References: 
 <27B956916AFFD64CBC38E83259F6AEFD0A248BA5@RDU-CAEXCH-02.channeladvisor.com>
Date: Wed, 1 Dec 2010 13:44:31 -0500
Message-ID: <AANLkTi=10XyXM79O0t=Sek0N4uC7BgS6zZKh+u0aY-eB@mail.gmail.com>
Subject: Re: MVCC and Compaction on Update heavy DB
From: Randall Leeds <randall.leeds@gmail.com>
To: user@couchdb.apache.org
Content-Type: text/plain; charset=UTF-8

I can't remember whether it was Selena or Josh who covered Postgres'
vacuum system in some depth in a talk at CouchCamp. My knowledge is
far from deep, but from what I gathered Postgres has a much more
complicated vacuum system. It tries to reclaim space within the DB
file and has to deal with long-running transactional UPDATE commands
and whatnot.

Contrast this with Couch which has no bulk transactional semantics and
has a dead simple compaction system that writes out the entire
database to a fresh file. While there will be some racing to flush new
writes if the database file is concurrently being updated, compaction
always seems to finish even though it may take a few (progressively
shorter) passes.

As far as when to compact. If you know roughly your insert/update
ratio you can do some calculations based on the update sequence and
doc count to decide when you might run compaction to reclaim space. In
my personal experience it has often been easiest to just run
compaction every night and I've heard of production environments that
run continuous compaction. CouchDB compaction could probably be made
faster even without switching from the rewrite-db-and-swap method, but
in its current form I've found that it's best to provision servers in
production environments under the assumption that compaction is always
running since it may take quite a while to finish and relying on
compacting during "off-peak" times may not be possible.

Randall

On Wed, Dec 1, 2010 at 10:56, Brad King <brad.king@channeladvisor.com> wrote:
> Hi, I've been away from couchdb for probably 2 years. The project has
> made great strides it appears. My question is around MVCC and
> Compaction. We currently run a 4 node PostgreSQL farm. We often
> experience problems trying to garbage collect dead tuples (vacuuming),
> due to MVCC. Keeping up with this is a big problem in Postgres. I know
> this is an entirely different product but, what is the recommended setup
> for a similar deployment on CouchDB ? With MVCC, it seems for heavy
> update or deletes we could have the same problems. The docs indicate
> It's possible to get behind on this and consume all disk. Is there any
> guidance on the recommended hardware configuration, max database size,
> compaction schedules, etc for a full production deployment that is
> update/delete heavy ? Thanks.
>