couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <>
Subject Re: svn commit: r897509 - /couchdb/trunk/etc/couchdb/
Date Tue, 12 Jan 2010 20:09:31 GMT
Your analysis is correct. Just to clarify, the btrees are *mostly* self balancing, but it omits
the 2 expensive rotations on btree rebalancing, which are used when removing keys from the
tree. It is possible for btrees to become unbalanced, but a simple compaction does fix it.
However, it needs to recompute the reductions, and for views that means recalling out to JS.
For database compaction, the reductions are very cheap CPU wise, so it's not a big there.
But if the btrees were fully balanced, then there no need to recompute reductions during compaction,
just copy the nodes and their existing reduction values.


On Jan 12, 2010, at 4:49 AM, Robert Dionne wrote:

> I don't think so, perhaps I misspoke. What I meant was the comment, "a balancing condition",
implied that it was balanced and was merely missing a particular case. It's not balanced in
the textbook sense of having an order n that determines lower and upper bounds on the number
of nodes at each level.
> The amount of branching is determined by a "chunkify" function that write out chunks
of a constant size and so will depend on the key size of kv nodes and the reductions for the
kp nodes. 
> So when deletions occur it will be unbalanced in the sense that compaction results in
a different b-tree which requires the reductions to be computed again. I believe the intent
of #224 is that if the b-tree is balanced in the text book sense that compaction can just
be a copy of the nodes.
> I haven't looked at this in some time, and it requires a very detailed analysis to see
if it might in fact be improved, particularly with respect to views. I'd love to hear Damien's
or Jan's or Paul's opinion on this. When I looked at this last summer I noted there been considerable
research in trees used in databases in recent years.  
> On Jan 12, 2010, at 4:02 AM, Brian Candler wrote:
>> On Mon, Jan 11, 2010 at 07:58:10AM -0500, Robert Dionne wrote:
>>> jira-224 makes an odd claim that it misses "a balancing condition", it
>>> doesn't balance at all so I'm not sure what this means.
>> If it's not balanced at all, does that mean there are degenerate cases which
>> could end up with having to do a linear search through the nodes?
>>   .
>>    ||||
>>        \
>>         ||||
>>             \
>>              ||||
>>                  \
>>                   ||||
>> Could inserting with strictly sequential ids cause that condition to occur?
>> (As would be the case with utc_random)

View raw message