couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Fwd: optimal settings for [couchdb] fsync_options?
Date Wed, 14 Apr 2010 12:52:01 GMT
Initially posted on user@, but maybe it got lost in the noise.  Does anyone know why we call
fsync when we open a file?


Begin forwarded message:

> From: Adam Kocoloski <>
> Date: April 11, 2010 10:44:03 PM EDT
> To:
> Subject: optimal settings for [couchdb] fsync_options?
> Hi folks, I wanted to assemble some concrete information about the purpose of each of
the three fsync_options available in CouchDB and under what conditions they should be enabled/disabled.
 These options are
> 1) before_header - calls file:sync(Fd) before writing a DB header to disk.  I believe
the goal here is to prevent DB corruption by ensuring that all the data referred to by the
header is durably stored before the header is written.  A system that preserves write ordering
could safely disable this option.  Does anyone know an example of such a system? Perhaps a
combination of a noop IO scheduler and a write-through or nonvolatile disk cache?
> 2) after_header - calls file:sync(Fd) immediately after writing the DB header.  I think
this one is done so that we don't lose too much data following a CouchDB restart, and so that
a client can ensure that stored data will be retrievable after a restart by POSTing to /db/_ensure_full_commit.
 It might make sense to disable this option if e.g. you're relying on replication for durability.
 Although that's dicey because the replicator calls ensure_full_commit for both DBs before
writing its own checkpoint record*, and by disabling the after_header option you'd run the
risk of skipping updates on the target in the face of a power failure.
> 3) on_file_open - calls file:sync(Fd) immediately after opening a DB file.  I really
don't know the purpose of this one.  Anyone?
> Best, Adam
> * The reason the replicator calls ensure_full_commit on the source is to detect situations
where update_seqs might be reused.  I wonder if we could engineer a way around that ever happening,
for example by ensuring that on restart the update sequence jumps by a large number.  But
that's a discussion for dev@.

View raw message