Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: pass (nike.apache.org: domain of b.candler@pobox.com designates
 64.74.157.62 as permitted sender)
Date: Tue, 18 Aug 2009 09:33:49 +0100
From: Brian Candler <B.Candler@pobox.com>
To: Chris Anderson <jchris@apache.org>
Cc: dev@couchdb.apache.org
Subject: Re: svn commit: r804427 - in /couchdb/trunk:
 etc/couchdb/default.ini.tpl.in share/www/script/test/delayed_commits.js
 src/couchdb/couch_db.erl src/couchdb/couch_httpd_db.erl
Message-ID: <20090818083349.GA7599@uk.tiscali.com>
References: <e282921e0908151017x57130f6fv16969e5a42fd46d7@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <e282921e0908151017x57130f6fv16969e5a42fd46d7@mail.gmail.com>
User-Agent: Mutt/1.5.17+20080114 (2008-01-14)

On Sat, Aug 15, 2009 at 10:17:28AM -0700, Chris Anderson wrote:
> One middle ground implementation that could work for throughput, would
> be to use the batch=ok ets based storage, but instead of immediately
> returning 202 Accepted, hold the connection open until the batch is
> written, and return 201 Created after the batch is written. This would
> allow the server to optimize batch size, without the client needing to
> worry about things, and we could return 201 Created and maintain our
> strong consistency guarantees.

Do you mean default to batch=ok behaviour? (In which case, if you don't want
to batch you'd specify something else, e.g. x-couch-full-commit: true?)

This is fine by me. Of course, clients doing sequential writes may see very
poor performance (i.e. write - wait response - write - wait response etc).
However this approach should work well with HTTP pipelining, as well as with
clients which open multiple concurrent HTTP connections. The replicator
would need to do pipelining, if it doesn't already.

As I was attempting to say before: any solution which makes write guarantees
should expose behaviour which is meaningful to the client.

- there's no point doing a full commit on every write unless you delay
  the HTTP response until after the commit (otherwise there's still a
  window where the client thinks the data has still gone safely to disk,
  but actually it could be lost)

- there's no point having two different forms of non-safe write, because
  there's no reasonable way for the client to choose between them.
  Currently we have 'batch=ok', and we also have a normal write without
  'x-couch-full-commit: true' - both end up with the data sitting in RAM
  for a while before going to disk, the difference being whether it's
  Erlang RAM or VFS buffer cache RAM.

> I like the idea of being able to tune the batch size internally within
> the server. This could allow CouchDB to automatically adjust for
> performance without changing consistency guarantees, eg: run large
> batches when under heavy load, but when accessed by a single user,
> just do full_commits all the time.

I agree. I also think it would be good to be able to tune this per DB, or
more simply, per write.

e.g. a PUT request could specify max_wait=2000 (if not specified, use a
default value from the ini file). Subsequent requests could specify their
own max_wait params, and a full commit would occur when the earliest of
these times occurs. max_wait=0 would then replace the x-couch-full-commit:
header, which seems like a bit of a frig to me anyway.

from being resource hogs by specifying a min_wait in the ini file. That is,
if you set min_wait=100, then any client which insists on having a full
commit by specifying max_wait=0 may find itself delayed up to 0.1s before
its request is honoured.

Regards,

Brian.