couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Goodall <matt.good...@gmail.com>
Subject batch=ok for bulk_docs and single doc implementation concerns
Date Wed, 14 Apr 2010 11:59:53 GMT
Hi,

Over in couchdb-python land someone wanted to use batch=ok when
creating and updating documents, so we added support.

I was semi-surprised to notice that _bulk_docs does not support
batch=ok. I realise _bulk_docs essentially is a batch update but a
_bulk_docs batch=ok would presumably allow CouchDB to buffer more in
memory before writing to disk. What are your thoughts?

Now, this buffering is where the "implementation concerns" come in.
According to the wiki:

"There is a query option batch=ok which can be used to achieve higher
throughput at the cost of lower guarantees. When a PUT (or a document
POST as described below) is sent using this option, it is not
immediately written to disk. Instead it is stored in memory on a
per-user basis for a second or so (or the number of docs in memory
reaches a certain point). After the threshold has passed, the docs are
committed to disk."

However, unless I'm missing something (quite likely ;-)), there is no
"stored in memory on a per-user basis" or any check for when "the
number of docs in memory reaches a certain point". All it seems to do
is spawn a new process so the update happens when the Erlang scheduler
gets around to it. In fact, I don't see any reference to the
batch_save_interval and batch_save_size configuration options in the
code.

Shouldn't batch=ok send the doc off to some background process that
accumulates docs until either the batch interval or size threshold has
been reached? This would also ensure that batch=ok updates are handled
in the order they arrive, although I'm not sure if that matters given
that the user has basically said they don't care if it succeeds or not
by using batch=ok.

- Matt

Mime
View raw message