Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 56190 invoked from network); 1 Dec 2010 18:44:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Dec 2010 18:44:59 -0000 Received: (qmail 90636 invoked by uid 500); 1 Dec 2010 18:44:58 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 90602 invoked by uid 500); 1 Dec 2010 18:44:58 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 90592 invoked by uid 99); 1 Dec 2010 18:44:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Dec 2010 18:44:58 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of randall.leeds@gmail.com designates 209.85.161.52 as permitted sender) Received: from [209.85.161.52] (HELO mail-fx0-f52.google.com) (209.85.161.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Dec 2010 18:44:52 +0000 Received: by fxm5 with SMTP id 5so5781700fxm.11 for ; Wed, 01 Dec 2010 10:44:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=vZ72WwJKS0Nssbx9mb+ibXp0mny8ZedBofOPV3Pw0Vk=; b=G9X/W8+zklC6iFEUKJZKdh4Lci/62SD463NKUKmNtuCOFQSej29+0HZQP+YWLvzD+O 2mGpB3UV/8iJQMMnU/xjbzUPmAptv7LK0q6QHtK0w1mDkyixbbwkwevoAeE1PnfX9t2o YHl7lprS51ULF0lm68Gmg/+iEKz7kMxY5PYeg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=or6SRwkj6pOpQ02tYXvMda/QiM0QBUc/Mas/IFTJsjTXWaXl2Z5tM1Wmp4m7DNno9T tYyYgeF9r8zpXrZEa0EElzkv5KEubQBRRO0cLya7AmBokE+pccCy9aFkb3o/1KpD1bJw +iUsehAgjx4A6LjQDvYmR2gAdtPvE+2Eg0M6M= MIME-Version: 1.0 Received: by 10.223.125.136 with SMTP id y8mr488607far.149.1291229071576; Wed, 01 Dec 2010 10:44:31 -0800 (PST) Received: by 10.223.96.79 with HTTP; Wed, 1 Dec 2010 10:44:31 -0800 (PST) In-Reply-To: <27B956916AFFD64CBC38E83259F6AEFD0A248BA5@RDU-CAEXCH-02.channeladvisor.com> References: <27B956916AFFD64CBC38E83259F6AEFD0A248BA5@RDU-CAEXCH-02.channeladvisor.com> Date: Wed, 1 Dec 2010 13:44:31 -0500 Message-ID: Subject: Re: MVCC and Compaction on Update heavy DB From: Randall Leeds To: user@couchdb.apache.org Content-Type: text/plain; charset=UTF-8 I can't remember whether it was Selena or Josh who covered Postgres' vacuum system in some depth in a talk at CouchCamp. My knowledge is far from deep, but from what I gathered Postgres has a much more complicated vacuum system. It tries to reclaim space within the DB file and has to deal with long-running transactional UPDATE commands and whatnot. Contrast this with Couch which has no bulk transactional semantics and has a dead simple compaction system that writes out the entire database to a fresh file. While there will be some racing to flush new writes if the database file is concurrently being updated, compaction always seems to finish even though it may take a few (progressively shorter) passes. As far as when to compact. If you know roughly your insert/update ratio you can do some calculations based on the update sequence and doc count to decide when you might run compaction to reclaim space. In my personal experience it has often been easiest to just run compaction every night and I've heard of production environments that run continuous compaction. CouchDB compaction could probably be made faster even without switching from the rewrite-db-and-swap method, but in its current form I've found that it's best to provision servers in production environments under the assumption that compaction is always running since it may take quite a while to finish and relying on compacting during "off-peak" times may not be possible. Randall On Wed, Dec 1, 2010 at 10:56, Brad King wrote: > Hi, I've been away from couchdb for probably 2 years. The project has > made great strides it appears. My question is around MVCC and > Compaction. We currently run a 4 node PostgreSQL farm. We often > experience problems trying to garbage collect dead tuples (vacuuming), > due to MVCC. Keeping up with this is a big problem in Postgres. I know > this is an entirely different product but, what is the recommended setup > for a similar deployment on CouchDB ? With MVCC, it seems for heavy > update or deletes we could have the same problems. The docs indicate > It's possible to get behind on this and consume all disk. Is there any > guidance on the recommended hardware configuration, max database size, > compaction schedules, etc for a full production deployment that is > update/delete heavy ? Thanks. >