couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: batch=ok for bulk_docs and single doc implementation concerns
Date Wed, 14 Apr 2010 14:02:13 GMT
On Apr 14, 2010, at 9:38 AM, Matt Goodall wrote:

> On 14 April 2010 13:23, Adam Kocoloski <kocolosk@apache.org> wrote:
>> On Apr 14, 2010, at 7:59 AM, Matt Goodall wrote:
>> 
>>> Hi,
>>> 
>>> Over in couchdb-python land someone wanted to use batch=ok when
>>> creating and updating documents, so we added support.
>>> 
>>> I was semi-surprised to notice that _bulk_docs does not support
>>> batch=ok. I realise _bulk_docs essentially is a batch update but a
>>> _bulk_docs batch=ok would presumably allow CouchDB to buffer more in
>>> memory before writing to disk. What are your thoughts?
>> 
>> Its probably of limited utility.  If you're already batching on the client side,
you can achieve the same effect by sending in a larger batch.  I'm not opposed to it per se,
just don't think it will help with throughput all that much.
> 
> :nod: given the new behaviour I'm inclined to agree.
> 
>> 
>>> 
>>> Now, this buffering is where the "implementation concerns" come in.
>>> According to the wiki:
>>> 
>>> "There is a query option batch=ok which can be used to achieve higher
>>> throughput at the cost of lower guarantees. When a PUT (or a document
>>> POST as described below) is sent using this option, it is not
>>> immediately written to disk. Instead it is stored in memory on a
>>> per-user basis for a second or so (or the number of docs in memory
>>> reaches a certain point). After the threshold has passed, the docs are
>>> committed to disk."
>>> 
>>> However, unless I'm missing something (quite likely ;-)), there is no
>>> "stored in memory on a per-user basis" or any check for when "the
>>> number of docs in memory reaches a certain point". All it seems to do
>>> is spawn a new process so the update happens when the Erlang scheduler
>>> gets around to it. In fact, I don't see any reference to the
>>> batch_save_interval and batch_save_size configuration options in the
>>> code.
>> 
>> The wiki describes the 0.10 implementation of batch=ok.  In 0.11 batch mode takes
advantage of the fact that couch_db_updater always merges all waiting updates to a DB into
a single write, and so doesn't bother with the separate set of supervised processes accumulating
documents.  In effect the 0.11 batch=ok is "I'm not going to wait around, but save this as
soon as you get a chance".
> 
> Ah, I didn't dig far enough into the code to see that happening.
> 
> So, purely for my understanding, it's now simplified to a delayed
> commit that happens at most 1000ms after normal changes are received.
> Anything that causes the commit to happen earlier cancels the pending
> commit.
> 
> Does that mean that batch="ok" with delayed_commits=false is meaningless?

So, we should distinguish between writes and fsyncs.  CouchDB 0.11 never waits to write; if
there is an update_docs message in couch_db_updater's mailbox it acts on that "immediately"
(that is, as soon as it finishes whatever else it's doing at the moment).  Moreover, it batches
together all the update_docs messages in its mailbox and does one write operation.  At the
end of this write operation the modified pages may not yet be flushed to disk, in fact they
almost certainly are not.  The kernel is caching them for a period of time.  That's where
fsync comes in.

The delayed_commits setting controls the frequency with which CouchDB writes the DB header
and calls fsync.  If it is set to false, CouchDB syncs the file as soon as it completes a
write operation.  A write operation can be a single document update, or it can update multiple
documents in the case of concurrent writer threads, batch=ok, or _bulk_docs requests.  If
delayed_commits is set to true, CouchDB syncs the file at 1 second intervals (if an update
to the file has occurred in that interval, of course).

batch=ok with delayed_commits=false is not quite meaningless, but you're right, you probably
won't sneak too many updates into a single commit unless fsync is really slow.  One example
is OS X, where Erlang's file:sync calls a different fcntl which actually forces the hard disk
to flush the data to spinning platters.  It's super-slow but more reliable than regular-old
fsync, which just gets the data from the kernel to the hard disk's cache.  If you have a non-volatile
disk cache on your Linux server that's cool, but a regular old consumer hard drive in your
MacBook does not have that luxury. 

> Anyway, it sounds like the two batch_save config options should be
> removed from etc/couchdb/default.ini.tpl.in.

Yes.

>> This does change the performance characteristics quite a bit; in particular, when
the underlying disk is fast the new batch=ok behavior will result in significantly larger
uncompacted databases.
> 
> Agh, this suggests I didn't understand the updater's behaviour. Large
> uncompacted database normally means lots of small additions to the
> database file. How does fast disk speed affect that?

All I meant there was that if the disk is slow, you can dump a bunch of messages into couch_db_updater's
mailbox while it's talking to the disk.  When it finishes what its doing and looks in the
mailbox, it'll batch everything in the mailbox together for the next write op.  This results
in a somewhat smaller DB file.  If the disk is fast couch_db_updater's mailbox will be mostly
empty, and it'll be doing a larger number of smaller operations.  Best,

Adam

>> 
>>> Shouldn't batch=ok send the doc off to some background process that
>>> accumulates docs until either the batch interval or size threshold has
>>> been reached? This would also ensure that batch=ok updates are handled
>>> in the order they arrive, although I'm not sure if that matters given
>>> that the user has basically said they don't care if it succeeds or not
>>> by using batch=ok.
>> 
>> I think the documents updates are still handled in the order in which they were received.
>> 
>>> 
>>> - Matt
>> 
>> 
>> Best, Adam


Mime
View raw message