Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 39889 invoked from network); 5 Aug 2009 10:55:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Aug 2009 10:55:17 -0000 Received: (qmail 29800 invoked by uid 500); 5 Aug 2009 10:55:24 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 29732 invoked by uid 500); 5 Aug 2009 10:55:23 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 29722 invoked by uid 99); 5 Aug 2009 10:55:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Aug 2009 10:55:23 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of b.candler@pobox.com designates 208.72.237.25 as permitted sender) Received: from [208.72.237.25] (HELO sasl.smtp.pobox.com) (208.72.237.25) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Aug 2009 10:55:13 +0000 Received: from localhost.localdomain (unknown [127.0.0.1]) by a-sasl-quonix.sasl.smtp.pobox.com (Postfix) with ESMTP id 37639214FE; Wed, 5 Aug 2009 06:54:51 -0400 (EDT) Received: from mappit (unknown [80.45.95.114]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by a-sasl-quonix.sasl.smtp.pobox.com (Postfix) with ESMTPSA id ACCF3214FD; Wed, 5 Aug 2009 06:54:49 -0400 (EDT) Received: from brian by mappit with local (Exim 4.69) (envelope-from ) id 1MYe8u-0002gY-9t; Wed, 05 Aug 2009 11:54:48 +0100 Date: Wed, 5 Aug 2009 11:54:48 +0100 From: Brian Candler To: "Jan Lehnardt (JIRA)" Cc: dev@couchdb.apache.org Subject: Re: [jira] Commented: (COUCHDB-449) Turn off delayed commits by default Message-ID: <20090805105448.GA9881@uk.tiscali.com> References: <583082672.1249467074859.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <583082672.1249467074859.JavaMail.jira@brutus> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Pobox-Relay-ID: 6687AD86-81AE-11DE-8803-F699A5B33865-28021239!a-sasl-quonix.pobox.com X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Aug 05, 2009 at 03:11:14AM -0700, Jan Lehnardt (JIRA) wrote: > [delayed_commits] > dbname = true > dbname2 = false > ... > ... > > so you can have a "safe" db for your app and a "fast" db for, say, logging. Or perhaps you could set a different periodic flush interval for each database, with 0 equivalent to no delayed commit. For me, the question is specifically, what guarantees does CouchDB give to clients about your data safety, and when - for example, at the point where you get a HTTP response? There are at least three different scenarios that I'm aware of at the moment. 1. client supplies 'batch=ok' URL parameter 2. client supplies no special parameters 3. client supplies 'X-Couch-Full-Commit: true' header >From the client's perspective, I can see no difference between (1) and (2). After receiving a HTTP response, the data is likely to make it to disk at some time in the future, but it could be lost if the plug is pulled in the next few seconds. In case (3), the document is guaranteed to be on disk after the HTTP response is returned [as long as drive internal write cache is disabled]. This is equivalent to "QOS level 1" in the MQTT protocol: http://publib.boulder.ibm.com/infocenter/wmbhelp/v6r0m0/index.jsp?topic=/com.ibm.etools.mft.doc/ac10850_.htm However, it also forces writes of everything received up to this point, so it's very inefficient if you are doing lots of writes with this header on. Sometimes, you don't require data to be written to disk immediately, but you do want to be notified *when* it has been written to disk in order to take some subsequent action (such as acknowledging the successful save to a downstream consumer). I would like to propose an alternative approach similar to TCP sequence numbers. We already have a sequence number which counts documents added to the database (update_seq). I suggest we keep a separate watermark which is the sequence number when the database was last flushed to disk (say flush_seq). Now: - when you PUT a document, send the update_seq as part of the response (let's call it doc_seq) - update_seq may continue to increment as more documents are updated - at some point in the future, when data is flushed to disk, set flush_seq := update_seq - if the client is interested to know when its document has been flushed to disk, it can poll mydb to check for flush_seq >= doc_seq - it could be an option in the HTTP request to delay the response until flush_seq >= doc_seq That means you would get the benefit of knowing that the document had been committed to disk, without the cost of having to commit it. Rather, you wait until someone else wants to force a full commit, or the periodic full commit takes place. Then the only per-database tunable you need is the periodic commit interval. Set it to 5 seconds for logging databases; 0.2 for RADIUS accounting (where you want to generate a response within 200ms); and 0 if you want every single document to be committed as soon as it arrives. Thoughts? Something like this is doable at present, but requires a buffering proxy. For example, you can receive RADIUS accounting updates into a buffer, then every 200ms do a POST to _bulk_docs with X-Couch-Full-Commit: true and return success to all the clients. Since CouchDB has to buffer these documents in the VFS cache anyway, it would be convenient (and more efficient) to let it handle the periodic flushing too. Regards, Brian.