couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <>
Subject Re: svn commit: r897509 - /couchdb/trunk/etc/couchdb/
Date Mon, 11 Jan 2010 12:58:10 GMT
I suspect the assertion that key size matters is valid, though to what extent I'm not sure.
There is an open ticket to improve the b-tree (#224).

I spent some time last summer reviewing couch_btree. It's quite elegant in that its passes
inserts/deletes/lookups down through the recursion at the same time. However it's not balanced,
technically (per definitions in Knuth) it's not a b-tree, it has no order. The number of nodes
written at each level is determined by the number to write and a constant, called chunk size.

jira-224 makes an odd claim that it misses "a balancing condition", it doesn't balance at
all so I'm not sure what this means.  #224 asserts that if it were balanced compaction would
go much faster as it would remove the need to recompute reductions, which are stored with
internal nodes and computed as the tree is constructed from the leaves up.

I assumed that balancing is less important due to the append only nature. Ironically what
makes the implementation quite elegant also makes balancing non-trivial. The delete case in
particular is hard because one wants to do splitting of nodes opportunistically on the way
down the recursion to minimize disk access.

The change to use a sequentially increasing random ids improves locality a lot, especially
in bulk inserts, but I would be -1 on changing the default.



On Jan 11, 2010, at 4:39 AM, Robert Newson wrote:

> Roger,
> Unless I misread you, you are implying that _id is stored increasingly
> less inefficiently in the .couch file as its length increases? I don't
> think, unless you've really dug into the disk structure, that this
> assertions will hold. A better measure is the number of btree nodes on
> disk for the different kinds of _id. I would expect to see more nodes
> (and a higher rate of 'turnover') for large random ids than equally as
> large sequential ids. Large random ids versus short sequential ids
> should show a larger difference but for two unrelated reasons.
> B.
> On Mon, Jan 11, 2010 at 9:33 AM, Robert Newson <> wrote:
>> I should point out that the sequential algorithm in couchdb is
>> carefully constructed so that generated ids won't clash, even in a
>> distributed system. You might have assumed that the sequential ids
>> were 1, 2, 3, 4, ... and so on, but they are not.
>> The sequential ids are the same length as the random ids (16 bytes).
>> The first 13 bytes stay the same for around 8000 generated ids and is
>> then rerandomized. The ids with the same prefix have suffixes in
>> strictly increasing numeric order. This characteristic (that a new id
>> is numerically close to the previous id) is what helps with insertion
>> speed and general b-tree performance.
>> Before changing the default I think it would be worth getting numbers
>> from a suitably fair benchmark, I would still advocate random as the
>> default until that is done.
>> B.
>> On Mon, Jan 11, 2010 at 12:51 AM, Chris Anderson <> wrote:
>>> On Sun, Jan 10, 2010 at 4:24 PM, Roger Binns <> wrote:
>>>> Hash: SHA1
>>>> Chris Anderson wrote:
>>>>> I'm not feeling super-strong about this. However, making the default
>>>>> sequential seems like it will preempt a lot of the problems people
>>>>> tend to show up asking about.
>>> If we think that speed and size are more important than randomness, we
>>> should continue to refine uuid generators.
>>> Roger, if you can make a short sequential that'd be neat.
>>>> There are several issues conflated together:
>>>> - - When doing inserts, sorted ids are faster
>>>> - - The resulting size of the db file is the size of the docs plus a multiple
>>>> of the _id size (and probably an exponential of the size)
>>>> - - Sequential ids give small _id
>>>> - - Random ids give large _id
>>>> - - Sequentials will clash between different dbs (consider replication,
>>>> multiple instances etc).  They'll also lead people to rely on this
>>>> functionality as though it was like a SQL primary key
>>>> - - Random ids won't clash and better illustrate how CouchDB really works
>>>>> I think the info-leakage argument is overblown
>>>> It does make URLs easy to guess like many ecommerce sites that didn't
>>>> validate when showing you an invoice - you added one to the numeric id in
>>>> the URL and got to see someone elses.
>>>> I would far prefer the size of the db file and the size of the _id link
>>>> being addressed.  Because the _id size can cause the db file to get so big,
>>>> I/O etc is a lot slower mainly because there is just so much more file to
>>>> deal with!  (In my original posting I had a db go from 21GB to 4GB by
>>>> reducings ids from 16 bytes to 4 bytes.)
>>>> Roger
>>>> -----BEGIN PGP SIGNATURE-----
>>>> Version: GnuPG v1.4.9 (GNU/Linux)
>>>> Comment: Using GnuPG with Mozilla -
>>>> FJkAoKgnkWnPayPtPqMfk3/AxOj2xaMx
>>>> =V7Zq
>>>> -----END PGP SIGNATURE-----
>>> --
>>> Chris Anderson

View raw message