couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: Understanding the CouchDB file format
Date Tue, 20 Dec 2011 20:35:44 GMT

Robert Dionne
Computer Programmer
dionne@dionne-associates.com
203.231.9961




On Dec 20, 2011, at 3:27 PM, Riyad Kalla wrote:

> Filipe,
> 
> Thank you for the reply.
> 
> Maybe I am misunderstanding exactly what couch is writing out; the docs
> I've read say that it "rewrites the root node" -- I can't tell if the docs
> mean the parent node of the child doc that was changed (as one of the b+
> leaves) or if it means the direct path, from the root, to that individual
> doc... or if it means the *entire* index...
> 
> In the case of even rewriting the single parent, with such a shallow tree,
> each internal leaf will have a huge fan of nodes; let's say 1-10k in a
> decent sized data set.
> 
> If you are seeing a few K of extra written out after each changed doc then
> that cannot be write... I almost assumed my understanding was wrong because
> the sheer volume of data would make performance abysmal if it was true.
> 
> Given that... is it just the changed path, from the root to the new leaf
> that is rewritten?

Hi Riyad,

You are correct, it's only the changed path. Interestingly I've just started to document all
these internals[1] along with links to the code and other references available. 

Cheers,

Bob


[1] http://bdionne.github.com/couchdb/

> That makes me all sorts of curious as to how Couch
> updates/searches the new modified index with the small diff that is written
> out.
> 
> Any pointers to reading that will help me dig down on this (even early bugs
> in JIRA?) would be appreciated. I've tried skimming back in 2007/08 on
> Damien's blog to see if it wrote about it in depth and so far haven't found
> anything as detailed as I am hoping for on this architecture.
> 
> Best,
> Riyad
> 
> On Tue, Dec 20, 2011 at 1:07 PM, Filipe David Manana <fdmanana@apache.org>wrote:
> 
>> On Tue, Dec 20, 2011 at 6:24 PM, Riyad Kalla <rkalla@gmail.com> wrote:
>>> I've been reading everything I can find on the CouchDB file format[1] and
>>> am getting bits and pieces here and there, but not a great, concrete,
>>> step-by-step explanation of the process.
>>> 
>>> I'm clear on the use of B+ trees and after reading a few papers on the
>>> benefits of log-structured file formats, I understand the benefits of
>>> inlining the B+ tree indices directly into the data file as well
>> (locality
>>> + sequential I/O)... what I'm flummoxed about is how much of the B+
>> tree's
>>> index is rewritten after every modified document.
>>> 
>>> Consider a CouchDB file that looks more or less like this:
>>> 
>>> [idx/header][doc1, rev1][idx/header][doc1, rev2]....
>>> 
>>> After each revised doc is written and the "b-tree root" is rewritten
>> after
>>> that, is that just a modified root node of the B+ tree or the entire B+
>>> tree?
>>> 
>>> The reason I ask is because regardless of the answer to my previous
>>> question, for a *huge* database will millions of records, that seems like
>>> an enormous amount of data to rewrite after every modification. Say the
>>> root node had a fanning factor of 133; that would still be alot of data
>> to
>>> rewrite.
>> 
>> Hi Riyad,
>> 
>> Have you observed that in practice?
>> 
>> Typically the depth of database btrees is not that high even for
>> millions of documents. For example I have one around with about 10
>> million documents which doesn't have more than 5 or 6 levels if I
>> recall correctly.
>> 
>> So updating a doc, for that particular case, means rewriting 5 or 6
>> new nodes plus the document itself. Each node is normally not much
>> bigger than 1.2Kb.
>> 
>> I've written once a tool to analyze database files which reports btree
>> depths, however it's not updated to work with recent changes on
>> master/1.2.x such as snappy compression and btree sizes:
>> 
>> https://github.com/fdmanana/couchfoo
>> 
>> It should work with CouchDB 1.1 (and older) database files.
>> 
>>> 
>>> I am certain I am missing the boat on this; if anyone can pull me out of
>>> the water and point me to dry land I'd appreciate it.
>>> 
>>> Best,
>>> R
>>> 
>>> 
>>> 
>>> [1]
>>> --
>>> 
>> http://jchrisa.net/drl/_design/sofa/_list/post/post-page?startkey=%5B%22CouchDB-Implements-a-Fundamental-Algorithm%22%5D
>>> -- http://horicky.blogspot.com/2008/10/couchdb-implementation.html
>>> -- http://blog.kodekabuki.com/post/132952897/couchdb-naked
>>> -- http://guide.couchdb.org/editions/1/en/btree.html
>>> -- http://ayende.com/blog/* (Over my head)
>> 
>> 
>> 
>> --
>> Filipe David Manana,
>> 
>> "Reasonable men adapt themselves to the world.
>> Unreasonable men adapt the world to themselves.
>> That's why all progress depends on unreasonable men."
>> 


Mime
View raw message