incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barry Wark <barryw...@gmail.com>
Subject Re: Why sequential document ids? [was: Re: What's the speed(performance) of couchdb?]
Date Thu, 26 Feb 2009 22:58:46 GMT
On Thu, Feb 26, 2009 at 11:18 AM, Damien Katz <damien@apache.org> wrote:
>
> On Feb 26, 2009, at 1:55 PM, Jan Lehnardt wrote:
>
>>
>> On 26 Feb 2009, at 19:49, Barry Wark wrote:
>>>>
>>>> or ascending...
>>>
>>> As an asside, why is it that sequential document ids would produce a
>>> significant performance boost? I suspect the answer is something
>>> rather fundamental to CouchDB's design, and I'd like to try to grok
>>> it.
>>
>> b-trees inner-nodes can get cached better if inserts basically always
>> use the same path.
>>
>
> What he said. It's pretty standard btree stuff, most, if not all the major
> rdbms have similar issues with primary keys.
>
> Also, he Ids don't need to be sequential (1,2,3,4...), just ordered
> (1,5,19,22...). And they don't need to sort higher or lower than all the
> other ids, so long as they are clustered together.  The each btree nodes
> that have to be loaded that isn't in cache is expensive. The more the keys
> have to be inserted into random places in the btree, the worse the caching
> behavior. Right now, with the crypto random UUIDs we generate, it's
> basically the worst case scenario for doc inserts.


Ah, of course. Thanks everyone!

*thinking back to my CS undergrad days* (I don't come across this
stuff much in my work these days...)
Inserting with sequential IDs would lead to btree balance issues,
though, wouldn't it? In that sense the crypto random UUIDs would seem
to be the ideal for read performance in the future as opposed to
insert performance. Am I on the right track?

Thanks for the info,
Barry
>
> -Damien
>

Mime
View raw message