couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riyad Kalla <>
Subject Understanding the CouchDB file format
Date Tue, 20 Dec 2011 18:24:13 GMT
I've been reading everything I can find on the CouchDB file format[1] and
am getting bits and pieces here and there, but not a great, concrete,
step-by-step explanation of the process.

I'm clear on the use of B+ trees and after reading a few papers on the
benefits of log-structured file formats, I understand the benefits of
inlining the B+ tree indices directly into the data file as well (locality
+ sequential I/O)... what I'm flummoxed about is how much of the B+ tree's
index is rewritten after every modified document.

Consider a CouchDB file that looks more or less like this:

[idx/header][doc1, rev1][idx/header][doc1, rev2]....

After each revised doc is written and the "b-tree root" is rewritten after
that, is that just a modified root node of the B+ tree or the entire B+

The reason I ask is because regardless of the answer to my previous
question, for a *huge* database will millions of records, that seems like
an enormous amount of data to rewrite after every modification. Say the
root node had a fanning factor of 133; that would still be alot of data to

I am certain I am missing the boat on this; if anyone can pull me out of
the water and point me to dry land I'd appreciate it.


--* (Over my head)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message