incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Why sequential document ids? [was: Re: What's the speed(performance) of couchdb?]
Date Thu, 26 Feb 2009 23:31:34 GMT
On Thu, Feb 26, 2009 at 5:58 PM, Barry Wark <barrywark@gmail.com> wrote:
> On Thu, Feb 26, 2009 at 11:18 AM, Damien Katz <damien@apache.org> wrote:
>>
>> On Feb 26, 2009, at 1:55 PM, Jan Lehnardt wrote:
>>
>>>
>>> On 26 Feb 2009, at 19:49, Barry Wark wrote:
>>>>>
>>>>> or ascending...
>>>>
>>>> As an asside, why is it that sequential document ids would produce a
>>>> significant performance boost? I suspect the answer is something
>>>> rather fundamental to CouchDB's design, and I'd like to try to grok
>>>> it.
>>>
>>> b-trees inner-nodes can get cached better if inserts basically always
>>> use the same path.
>>>
>>
>> What he said. It's pretty standard btree stuff, most, if not all the major
>> rdbms have similar issues with primary keys.
>>
>> Also, he Ids don't need to be sequential (1,2,3,4...), just ordered
>> (1,5,19,22...). And they don't need to sort higher or lower than all the
>> other ids, so long as they are clustered together.  The each btree nodes
>> that have to be loaded that isn't in cache is expensive. The more the keys
>> have to be inserted into random places in the btree, the worse the caching
>> behavior. Right now, with the crypto random UUIDs we generate, it's
>> basically the worst case scenario for doc inserts.
>
>
> Ah, of course. Thanks everyone!
>
> *thinking back to my CS undergrad days* (I don't come across this
> stuff much in my work these days...)
> Inserting with sequential IDs would lead to btree balance issues,
> though, wouldn't it? In that sense the crypto random UUIDs would seem
> to be the ideal for read performance in the future as opposed to
> insert performance. Am I on the right track?
>
> Thanks for the info,
> Barry
>>
>> -Damien
>>
>

Not sure i follow what you mean by balance issues. Seeing as its a
btree it can't be unbalanced because if it were it wouldn't be a
btree. But then our definitions of balance might be different. :D

The only performance difference between reading random and the
sequential document ids I can think of would favor the sequential side
when you're reading documents in roughly the same order as their ids
so we get FS cache hits.

HTH,
Paul Davis

Mime
View raw message