couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Why sequential document ids? [was: Re: What's the speed(performance) of couchdb?]
Date Fri, 27 Feb 2009 00:12:28 GMT
On Thu, Feb 26, 2009 at 6:54 PM, Chris Anderson <> wrote:
> On Thu, Feb 26, 2009 at 3:31 PM, Paul Davis <> wrote:
>> Not sure i follow what you mean by balance issues. Seeing as its a
>> btree it can't be unbalanced because if it were it wouldn't be a
>> btree. But then our definitions of balance might be different. :D
> When a B-tree is consistently inserted into, with a new highest (or
> lowest) value, you can end up with a deeper tree than when it is
> randomly inserted into. Imagine the inner-nodes all accumulating on
> the right side of the tree, so in the worst case you can end up with
> one lonely leaf node to the left of the root, and essentially a linked
> list of inner-nodes on the right side (such that cost would approach
> O(N) for inserts and lookups.

An unbalanced btree is by definition, not a btree. One of the major
defintion bullet points is that all leaf nodes are always exactly the
same depth.

Some interesting code that makes the rebalancing stick out is the code
in couch_btree:complete_root that 'caps the tree' if you will.

> My benchmarks show that inserts do not slow down as you would expect
> them to in this worst case, so Couch is probably doing something
> non-naive to rebalance the tree even when inserts are always at one
> extreme.
>> The only performance difference between reading random and the
>> sequential document ids I can think of would favor the sequential side
>> when you're reading documents in roughly the same order as their ids
>> so we get FS cache hits.
> From a practical standpoint, you're probably right about this
> (assuming the tree remains shallow even with inserts consistently at
> one edge of the tree.)
> Chris
> --
> Chris Anderson

Paul Davis

View raw message