Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 77312 invoked from network); 18 Aug 2009 08:34:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Aug 2009 08:34:08 -0000 Received: (qmail 53378 invoked by uid 500); 18 Aug 2009 08:34:27 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 53316 invoked by uid 500); 18 Aug 2009 08:34:27 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 53306 invoked by uid 99); 18 Aug 2009 08:34:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2009 08:34:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of b.candler@pobox.com designates 64.74.157.62 as permitted sender) Received: from [64.74.157.62] (HELO sasl.smtp.pobox.com) (64.74.157.62) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2009 08:34:15 +0000 Received: from localhost.localdomain (unknown [127.0.0.1]) by a-pb-sasl-sd.pobox.com (Postfix) with ESMTP id 33D062D80C; Tue, 18 Aug 2009 04:33:53 -0400 (EDT) Received: from mappit (unknown [80.45.95.114]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-sd.pobox.com (Postfix) with ESMTPSA id 7683F2D80B; Tue, 18 Aug 2009 04:33:51 -0400 (EDT) Received: from brian by mappit with local (Exim 4.69) (envelope-from ) id 1MdK8b-00023g-FL; Tue, 18 Aug 2009 09:33:49 +0100 Date: Tue, 18 Aug 2009 09:33:49 +0100 From: Brian Candler To: Chris Anderson Cc: dev@couchdb.apache.org Subject: Re: svn commit: r804427 - in /couchdb/trunk: etc/couchdb/default.ini.tpl.in share/www/script/test/delayed_commits.js src/couchdb/couch_db.erl src/couchdb/couch_httpd_db.erl Message-ID: <20090818083349.GA7599@uk.tiscali.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Pobox-Relay-ID: DC84AD36-8BD1-11DE-8577-AEF1826986A2-28021239!a-pb-sasl-sd.pobox.com X-Virus-Checked: Checked by ClamAV on apache.org On Sat, Aug 15, 2009 at 10:17:28AM -0700, Chris Anderson wrote: > One middle ground implementation that could work for throughput, would > be to use the batch=ok ets based storage, but instead of immediately > returning 202 Accepted, hold the connection open until the batch is > written, and return 201 Created after the batch is written. This would > allow the server to optimize batch size, without the client needing to > worry about things, and we could return 201 Created and maintain our > strong consistency guarantees. Do you mean default to batch=ok behaviour? (In which case, if you don't want to batch you'd specify something else, e.g. x-couch-full-commit: true?) This is fine by me. Of course, clients doing sequential writes may see very poor performance (i.e. write - wait response - write - wait response etc). However this approach should work well with HTTP pipelining, as well as with clients which open multiple concurrent HTTP connections. The replicator would need to do pipelining, if it doesn't already. As I was attempting to say before: any solution which makes write guarantees should expose behaviour which is meaningful to the client. - there's no point doing a full commit on every write unless you delay the HTTP response until after the commit (otherwise there's still a window where the client thinks the data has still gone safely to disk, but actually it could be lost) - there's no point having two different forms of non-safe write, because there's no reasonable way for the client to choose between them. Currently we have 'batch=ok', and we also have a normal write without 'x-couch-full-commit: true' - both end up with the data sitting in RAM for a while before going to disk, the difference being whether it's Erlang RAM or VFS buffer cache RAM. > I like the idea of being able to tune the batch size internally within > the server. This could allow CouchDB to automatically adjust for > performance without changing consistency guarantees, eg: run large > batches when under heavy load, but when accessed by a single user, > just do full_commits all the time. I agree. I also think it would be good to be able to tune this per DB, or more simply, per write. e.g. a PUT request could specify max_wait=2000 (if not specified, use a default value from the ini file). Subsequent requests could specify their own max_wait params, and a full commit would occur when the earliest of these times occurs. max_wait=0 would then replace the x-couch-full-commit: header, which seems like a bit of a frig to me anyway. from being resource hogs by specifying a min_wait in the ini file. That is, if you set min_wait=100, then any client which insists on having a full commit by specifying max_wait=0 may find itself delayed up to 0.1s before its request is honoured. Regards, Brian.