couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Database integrity [Detailed info on the B-tree store? Native implementations thereof?]
Date Wed, 12 Aug 2009 04:20:17 GMT
On Aug 11, 2009, at 11:42 PM, Jens Alfke wrote:

>
> On Aug 11, 2009, at 5:03 PM, Damien Katz wrote:
>
>>> The worst problem is that the disk controller will reorder sector  
>>> writes to reduce seek time, which in effect means that if power is  
>>> lost, some random subset of the last writes may not happen. So you  
>>> won't just end up with a truncated file — you could have a file  
>>> that seems intact and has a correct header at the end, but has 4k  
>>> bytes of garbage somewhere within the last transaction. Does  
>>> CouchDB's file structure guard against that?
>>
>> First we fsync all the data and indexes, then we write and fsync  
>> the headers in a separate step.
>
> Cool. From my discussions with Apple filesystem guru Dominic  
> Giampaolo, I gather that this two-phase approach is the right way to  
> guarantee consistency. (It's also used by the HFS+ filesystem to  
> secure its journal.)
>
> The caveat is that the fsyncs have to be the paranoid kind that  
> flush the disk-controller cache, not just the OS kernel cache. (This  
> is what the nonstandard F_FULLFSYNC mode does in Darwin/OS X;  
> hopefully CouchDB knows to use that when built for that platform.)

Yep, we know to use that flag -- it was Jan who supplied a patch to  
the Erlang/OTP team to get F_FULLFSYNC used by Erlang's standard file  
module.  If I recall correctly we knew there was a problem when  
CouchDB was claiming ~200 fsyncs/second on our laptops :-)

Cheers, Adam


Mime
View raw message