couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <>
Subject Re: Database integrity [Detailed info on the B-tree store? Native implementations thereof?]
Date Wed, 12 Aug 2009 03:42:29 GMT

On Aug 11, 2009, at 5:03 PM, Damien Katz wrote:

>> The worst problem is that the disk controller will reorder sector  
>> writes to reduce seek time, which in effect means that if power is  
>> lost, some random subset of the last writes may not happen. So you  
>> won't just end up with a truncated file — you could have a file  
>> that seems intact and has a correct header at the end, but has 4k  
>> bytes of garbage somewhere within the last transaction. Does  
>> CouchDB's file structure guard against that?
> First we fsync all the data and indexes, then we write and fsync the  
> headers in a separate step.

Cool. From my discussions with Apple filesystem guru Dominic  
Giampaolo, I gather that this two-phase approach is the right way to  
guarantee consistency. (It's also used by the HFS+ filesystem to  
secure its journal.)

The caveat is that the fsyncs have to be the paranoid kind that flush  
the disk-controller cache, not just the OS kernel cache. (This is what  
the nonstandard F_FULLFSYNC mode does in Darwin/OS X; hopefully  
CouchDB knows to use that when built for that platform.)

View raw message