couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: Understanding the CouchDB file format
Date Fri, 23 Dec 2011 01:10:34 GMT
On Thu, Dec 22, 2011 at 12:11, Riyad Kalla <rkalla@gmail.com> wrote:
> On Thu, Dec 22, 2011 at 12:38 PM, Robert Newson <rnewson@apache.org> wrote:
>
>> There are a
>> few parts of the article that are inaccurate (like the assertion we
>> have good locality for the id and seq trees. If this were true we
>> wouldn't have seen such a huge improvement in compaction by
>> temporarily separating them).
>
>
> I'd look forward to more detail on this... it was my understanding the
> updates were appended out in the [doc rev][_id idx diff][_seq idx diff]
> format at the end of the data file. Sounds like I may have misunderstood
> that?
>

Riyad, as you pointed out earlier, all the inner nodes are rewritten
up to the root. The two btrees are not written in parallel, though,
which means that for deep trees all the updated nodes are written
before the other tree's nodes are written. Also remember that the
trees themselves end up pretty fragmented since older nodes that
haven't changed are back toward the beginning of the file. In general,
I'm not sure there's much that's useful to mention about locality in
the trees. Also, updating these trees requires reading the old values,
so there is still seeking that occurs (if the pages aren't cached by
the OS).

>
>> The 'COMPETE recreation' paragraph also
>> strikes me as factually awry.
>>
>
> I'd appreciate a detailed correction on this if it is wrong; all the
> digging I've done (in this thread and other partial resources) suggests
> that the path from the changed doc ref back to the root (including a copy
> of all internal nodes and all of their child references) is written so as
> being able to read-back into the index from the tail of the data file
> quickly... specifically slides 17, 18 and 19 from this slidedeck (
> http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed) -- note that
> the interim nodes [A..M] and [A..Z] are rewritten (including any and all
> child pointers they contained).
>
> This is what I was referring to; I should either clean up my wording
> (confusing) or I got it wrong in which case I'd appreciate any and all
> corrections.

Right. It mostly seems a bit confusing to me.
"it DOES NOT just rewrite the nodes pathing from the leaf to the node
and ONLY the connections for that single document"
That doesn't sound quite right, but I can tell what you're trying to
say is accurate. If I'm right, you mean that every changed inner node
is rewritten in its entirety rather than having a single pointer to
the new child updated in place.

Cheers. Thanks for taking the time to write this up.

-Randall

Mime
View raw message