couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <>
Subject Re: Detailed info on the B-tree store? Native implementations thereof?
Date Tue, 11 Aug 2009 19:07:55 GMT

On Aug 11, 2009, at 10:37 AM, Chris Anderson wrote:

> Since this article, we've changed the header handling, so that we
> don't keep it at the top of the file, but instead append the header at
> the end of the file at every commit. The strict append-only nature of
> the storage engine is the source of it's robustness. Even an extreme
> action, like truncating the file, will not result in an inconsistent
> state.

Interesting. Does this really guarantee file integrity even in the  
case of power failure? (I have some experience dealing with file  
corruption, from working on Mac OS X components that use sqlite.) The  
worst problem is that the disk controller will reorder sector writes  
to reduce seek time, which in effect means that if power is lost, some  
random subset of the last writes may not happen. So you won't just end  
up with a truncated file — you could have a file that seems intact and  
has a correct header at the end, but has 4k bytes of garbage somewhere  
within the last transaction. Does CouchDB's file structure guard  
against that?

My concern with HTML5 local storage is that it's going to be used for  
important user data that cannot be lost, just the way native apps put  
irreplacable data in local files. But the data stores being used to  
implement local storage are much less resilient than the filesystem  
itself. My experience with sqlite is that heavily-used databases on  
consumer machines get corrupted and lost every few months.( This isn't  
directly related to CouchDB itself; but it's why I'm interested in the  
fault-tolerant data store it uses.)

> The other aspect our API that web storage will need to be
> concurrency-friendly is MVCC. Without MVCC you end up needing long
> transactions between page-loads, like localStorage currently has,
> which makes it useless for sharing state between windows.

I'm still not 100% convinced by your analysis in that blog post. A  
script running in a web page will implicitly acquire a lock when it  
accesses local storage, and release the lock at the end of the current  
event that it's handling (i.e. a user action or XHR response.) This is  
sufficiently fine-grained as to not pose a problem, I think.

But Jeremy Orlow pointed out a more problematic case to me — the HML5  
worker-thread API. Worker threads should be able to access local  
storage, and they don't have an event-based model; so a worker thread  
will probably be within some internal 'while' loop during its entire  
lifespan. There is thus no way to automatically handle transactions  
for it, so it will have to manually acquire and release locks. That  
means that a buggy or blocked worker thread could starve web pages in  
the same domain from accessing local storage. That's bad.

> Maybe the easiest thing would be to just start bundling CouchDB with
> your browser. :)

In a lot of ways that would be really awesome. However, it would have  
a terrible effect on the download size of the browser, which is an  
important consideration. (IIRC, the all-in-one double-clickable Mac  
CouchDB package is something like 15MB.)

I like the idea, which I think you proposed, of putting a basic b-tree  
API into the browser, and being able to implement a lite storage  
system compatible with CouchDB on top of it in JS.

View raw message