couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject optimal settings for [couchdb] fsync_options?
Date Mon, 12 Apr 2010 02:44:03 GMT
Hi folks, I wanted to assemble some concrete information about the purpose of each of the three
fsync_options available in CouchDB and under what conditions they should be enabled/disabled.
 These options are

1) before_header - calls file:sync(Fd) before writing a DB header to disk.  I believe the
goal here is to prevent DB corruption by ensuring that all the data referred to by the header
is durably stored before the header is written.  A system that preserves write ordering could
safely disable this option.  Does anyone know an example of such a system? Perhaps a combination
of a noop IO scheduler and a write-through or nonvolatile disk cache?

2) after_header - calls file:sync(Fd) immediately after writing the DB header.  I think this
one is done so that we don't lose too much data following a CouchDB restart, and so that a
client can ensure that stored data will be retrievable after a restart by POSTing to /db/_ensure_full_commit.
 It might make sense to disable this option if e.g. you're relying on replication for durability.
 Although that's dicey because the replicator calls ensure_full_commit for both DBs before
writing its own checkpoint record*, and by disabling the after_header option you'd run the
risk of skipping updates on the target in the face of a power failure.

3) on_file_open - calls file:sync(Fd) immediately after opening a DB file.  I really don't
know the purpose of this one.  Anyone?

Best, Adam

* The reason the replicator calls ensure_full_commit on the source is to detect situations
where update_seqs might be reused.  I wonder if we could engineer a way around that ever happening,
for example by ensuring that on restart the update sequence jumps by a large number.  But
that's a discussion for dev@.
Mime
View raw message