couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: batch=ok for bulk_docs and single doc implementation concerns
Date Wed, 14 Apr 2010 12:23:45 GMT
On Apr 14, 2010, at 7:59 AM, Matt Goodall wrote:

> Hi,
> 
> Over in couchdb-python land someone wanted to use batch=ok when
> creating and updating documents, so we added support.
> 
> I was semi-surprised to notice that _bulk_docs does not support
> batch=ok. I realise _bulk_docs essentially is a batch update but a
> _bulk_docs batch=ok would presumably allow CouchDB to buffer more in
> memory before writing to disk. What are your thoughts?

Its probably of limited utility.  If you're already batching on the client side, you can achieve
the same effect by sending in a larger batch.  I'm not opposed to it per se, just don't think
it will help with throughput all that much.

> 
> Now, this buffering is where the "implementation concerns" come in.
> According to the wiki:
> 
> "There is a query option batch=ok which can be used to achieve higher
> throughput at the cost of lower guarantees. When a PUT (or a document
> POST as described below) is sent using this option, it is not
> immediately written to disk. Instead it is stored in memory on a
> per-user basis for a second or so (or the number of docs in memory
> reaches a certain point). After the threshold has passed, the docs are
> committed to disk."
> 
> However, unless I'm missing something (quite likely ;-)), there is no
> "stored in memory on a per-user basis" or any check for when "the
> number of docs in memory reaches a certain point". All it seems to do
> is spawn a new process so the update happens when the Erlang scheduler
> gets around to it. In fact, I don't see any reference to the
> batch_save_interval and batch_save_size configuration options in the
> code.

The wiki describes the 0.10 implementation of batch=ok.  In 0.11 batch mode takes advantage
of the fact that couch_db_updater always merges all waiting updates to a DB into a single
write, and so doesn't bother with the separate set of supervised processes accumulating documents.
 In effect the 0.11 batch=ok is "I'm not going to wait around, but save this as soon as you
get a chance".

This does change the performance characteristics quite a bit; in particular, when the underlying
disk is fast the new batch=ok behavior will result in significantly larger uncompacted databases.

> Shouldn't batch=ok send the doc off to some background process that
> accumulates docs until either the batch interval or size threshold has
> been reached? This would also ensure that batch=ok updates are handled
> in the order they arrive, although I'm not sure if that matters given
> that the user has basically said they don't care if it succeeds or not
> by using batch=ok.

I think the documents updates are still handled in the order in which they were received.

> 
> - Matt


Best, Adam
Mime
View raw message