couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Why sequential document ids? [was: Re: What's the speed(performance) of couchdb?]
Date Thu, 26 Feb 2009 23:54:23 GMT
On Thu, Feb 26, 2009 at 3:31 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>
> Not sure i follow what you mean by balance issues. Seeing as its a
> btree it can't be unbalanced because if it were it wouldn't be a
> btree. But then our definitions of balance might be different. :D

When a B-tree is consistently inserted into, with a new highest (or
lowest) value, you can end up with a deeper tree than when it is
randomly inserted into. Imagine the inner-nodes all accumulating on
the right side of the tree, so in the worst case you can end up with
one lonely leaf node to the left of the root, and essentially a linked
list of inner-nodes on the right side (such that cost would approach
O(N) for inserts and lookups.

My benchmarks show that inserts do not slow down as you would expect
them to in this worst case, so Couch is probably doing something
non-naive to rebalance the tree even when inserts are always at one
extreme.

>
> The only performance difference between reading random and the
> sequential document ids I can think of would favor the sequential side
> when you're reading documents in roughly the same order as their ids
> so we get FS cache hits.

>From a practical standpoint, you're probably right about this
(assuming the tree remains shallow even with inserts consistently at
one edge of the tree.)

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message