couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <>
Subject Re: svn commit: r804427 - in /couchdb/trunk: etc/couchdb/ share/www/script/test/delayed_commits.js src/couchdb/couch_db.erl src/couchdb/couch_httpd_db.erl
Date Mon, 24 Aug 2009 20:01:37 GMT
On Fri, Aug 21, 2009 at 7:37 PM, Adam Kocoloski<> wrote:
> On Aug 18, 2009, at 4:33 AM, Brian Candler wrote:
>> On Sat, Aug 15, 2009 at 10:17:28AM -0700, Chris Anderson wrote:
>>> One middle ground implementation that could work for throughput, would
>>> be to use the batch=ok ets based storage, but instead of immediately
>>> returning 202 Accepted, hold the connection open until the batch is
>>> written, and return 201 Created after the batch is written. This would
>>> allow the server to optimize batch size, without the client needing to
>>> worry about things, and we could return 201 Created and maintain our
>>> strong consistency guarantees.
>> Do you mean default to batch=ok behaviour? (In which case, if you don't
>> want
>> to batch you'd specify something else, e.g. x-couch-full-commit: true?)
>> This is fine by me. Of course, clients doing sequential writes may see
>> very
>> poor performance (i.e. write - wait response - write - wait response etc).
>> However this approach should work well with HTTP pipelining, as well as
>> with
>> clients which open multiple concurrent HTTP connections. The replicator
>> would need to do pipelining, if it doesn't already.
> Errm, it's going to be tough to pipeline PUTs and POSTs, as that's labeled a
> SHOULD NOT in RFC2616.  Even if we know that it would be safe to pipeline
> PUTs in CouchDB, HTTP clients are probably not going to let it happen.  I
> certainly agree about the connection pool, though.  The replicator does use
> a connection pool, and it pipelines GET requests, too.
>> As I was attempting to say before: any solution which makes write
>> guarantees
>> should expose behaviour which is meaningful to the client.
>> - there's no point doing a full commit on every write unless you delay
>> the HTTP response until after the commit (otherwise there's still a
>> window where the client thinks the data has still gone safely to disk,
>> but actually it could be lost)
> Right, and we do delay the response in that case, so I think it is
> meaningful.
>> - there's no point having two different forms of non-safe write, because
>> there's no reasonable way for the client to choose between them.
>> Currently we have 'batch=ok', and we also have a normal write without
>> 'x-couch-full-commit: true' - both end up with the data sitting in RAM
>> for a while before going to disk, the difference being whether it's
>> Erlang RAM or VFS buffer cache RAM.
>>> I like the idea of being able to tune the batch size internally within
>>> the server. This could allow CouchDB to automatically adjust for
>>> performance without changing consistency guarantees, eg: run large
>>> batches when under heavy load, but when accessed by a single user,
>>> just do full_commits all the time.
>> I agree. I also think it would be good to be able to tune this per DB, or
>> more simply, per write.
>> e.g. a PUT request could specify max_wait=2000 (if not specified, use a
>> default value from the ini file). Subsequent requests could specify their
>> own max_wait params, and a full commit would occur when the earliest of
>> these times occurs. max_wait=0 would then replace the x-couch-full-commit:
>> header, which seems like a bit of a frig to me anyway.
>> from being resource hogs by specifying a min_wait in the ini file. That
>> is,
>> if you set min_wait=100, then any client which insists on having a full
>> commit by specifying max_wait=0 may find itself delayed up to 0.1s before
>> its request is honoured.
> I interpreted Chris' idea differently.  Instead of exposing yet more ways to
> try to tune the DB, put the tuning logic into the server and let it choose
> when to commit in an attempt to optimize both latency and throughput.


> A simple example might be to group together all outstanding write requests
> and do one commit for the group.  When the write load is low, we commit
> after every update.  When the disk is slow or the write load is high, we
> could have multiple incoming write requests while a single commit is in
> progress.  Instead of committing each one separately (the current behavior
> AFAIK) we'd update them all together like a single _bulk_docs request.  The
> latency for the earliest requests would increase, but the throughput would
> be much higher.

The one edge to remember here is that in the current implementation,
all documents being written to in a a batch must have the same
userCtx. This will cut down on the degree of optimization we can have
for multi-user workloads.

Maybe it'd be easy enough to build an internal API that allows writes
from multiple users in the same batch. I think this might be on the
major side as we'd have to account for it in the validation logic and
everywhere else that currently assumes that userCtx is part of the db

> In a perfect world I'd like to see x-couch-full-commit and _bulk_docs fall
> into disuse.  I realize the latter won't happen because not everyone wants
> to implement an HTTP connection pool.  batch=ok has very different semantics
> and so would still be useful, although I imagine that most uses are batch=ok
> are done to maximize throughput, not minimize latency.  If the throughput of
> normal operation was "high enough" batch=ok probably wouldn't be that
> popular.

I think so too - batch=ok was the easiest way to put an external API
on the server-side batching code, but once we have this new API we'll
probably see it used only in special circumstances.


Chris Anderson

View raw message