On Thu, Feb 26, 2009 at 5:58 PM, Barry Wark <barrywark@gmail.com> wrote:
> On Thu, Feb 26, 2009 at 11:18 AM, Damien Katz <damien@apache.org> wrote:
>>
>> On Feb 26, 2009, at 1:55 PM, Jan Lehnardt wrote:
>>
>>>
>>> On 26 Feb 2009, at 19:49, Barry Wark wrote:
>>>>>
>>>>> or ascending...
>>>>
>>>> As an asside, why is it that sequential document ids would produce a
>>>> significant performance boost? I suspect the answer is something
>>>> rather fundamental to CouchDB's design, and I'd like to try to grok
>>>> it.
>>>
>>> b-trees inner-nodes can get cached better if inserts basically always
>>> use the same path.
>>>
>>
>> What he said. It's pretty standard btree stuff, most, if not all the major
>> rdbms have similar issues with primary keys.
>>
>> Also, he Ids don't need to be sequential (1,2,3,4...), just ordered
>> (1,5,19,22...). And they don't need to sort higher or lower than all the
>> other ids, so long as they are clustered together. The each btree nodes
>> that have to be loaded that isn't in cache is expensive. The more the keys
>> have to be inserted into random places in the btree, the worse the caching
>> behavior. Right now, with the crypto random UUIDs we generate, it's
>> basically the worst case scenario for doc inserts.
>
>
> Ah, of course. Thanks everyone!
>
> *thinking back to my CS undergrad days* (I don't come across this
> stuff much in my work these days...)
> Inserting with sequential IDs would lead to btree balance issues,
> though, wouldn't it? In that sense the crypto random UUIDs would seem
> to be the ideal for read performance in the future as opposed to
> insert performance. Am I on the right track?
>
> Thanks for the info,
> Barry
>>
>> -Damien
>>
>
Not sure i follow what you mean by balance issues. Seeing as its a
btree it can't be unbalanced because if it were it wouldn't be a
btree. But then our definitions of balance might be different. :D
The only performance difference between reading random and the
sequential document ids I can think of would favor the sequential side
when you're reading documents in roughly the same order as their ids
so we get FS cache hits.
HTH,
Paul Davis
|