couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <>
Subject Re: svn commit: r897509 - /couchdb/trunk/etc/couchdb/
Date Mon, 11 Jan 2010 18:42:10 GMT
On Mon, Jan 11, 2010 at 4:58 AM, Robert Dionne
<> wrote:
> I suspect the assertion that key size matters is valid, though to what extent I'm not
sure. There is an open ticket to improve the b-tree (#224).
> I spent some time last summer reviewing couch_btree. It's quite elegant in that its passes
inserts/deletes/lookups down through the recursion at the same time. However it's not balanced,
technically (per definitions in Knuth) it's not a b-tree, it has no order. The number of nodes
written at each level is determined by the number to write and a constant, called chunk size.
> jira-224 makes an odd claim that it misses "a balancing condition", it doesn't balance
at all so I'm not sure what this means.  #224 asserts that if it were balanced compaction
would go much faster as it would remove the need to recompute reductions, which are stored
with internal nodes and computed as the tree is constructed from the leaves up.
> I assumed that balancing is less important due to the append only nature. Ironically
what makes the implementation quite elegant also makes balancing non-trivial. The delete case
in particular is hard because one wants to do splitting of nodes opportunistically on the
way down the recursion to minimize disk access.
> The change to use a sequentially increasing random ids improves locality a lot, especially
in bulk inserts, but I would be -1 on changing the default.
> Best,
> Bob
> On Jan 11, 2010, at 4:39 AM, Robert Newson wrote:
>> Roger,
>> Unless I misread you, you are implying that _id is stored increasingly
>> less inefficiently in the .couch file as its length increases? I don't
>> think, unless you've really dug into the disk structure, that this
>> assertions will hold. A better measure is the number of btree nodes on
>> disk for the different kinds of _id. I would expect to see more nodes
>> (and a higher rate of 'turnover') for large random ids than equally as
>> large sequential ids. Large random ids versus short sequential ids
>> should show a larger difference but for two unrelated reasons.
>> B.
>> On Mon, Jan 11, 2010 at 9:33 AM, Robert Newson <> wrote:
>>> I should point out that the sequential algorithm in couchdb is
>>> carefully constructed so that generated ids won't clash, even in a
>>> distributed system. You might have assumed that the sequential ids
>>> were 1, 2, 3, 4, ... and so on, but they are not.
>>> The sequential ids are the same length as the random ids (16 bytes).
>>> The first 13 bytes stay the same for around 8000 generated ids and is
>>> then rerandomized. The ids with the same prefix have suffixes in
>>> strictly increasing numeric order. This characteristic (that a new id
>>> is numerically close to the previous id) is what helps with insertion
>>> speed and general b-tree performance.
>>> Before changing the default I think it would be worth getting numbers
>>> from a suitably fair benchmark, I would still advocate random as the
>>> default until that is done.

My experience is that from time to time someone has a support request
where the symptom is "CouchDB is so slow as to be unusable" and the
answer is "set sequential uuids" and they are happen and CouchDB
"works" again.

Support requests are like cockroaches, for everyone you see there 100
others you don't. This math means the default random uuids is one of
the bigger bugs CouchDB ships with, and the switch to sequential is
one of the smallest patches with the biggest positive impacts we could

The downsides to sequential uuids are these (unless I've missed one).

Info leakage - the sequential uuids could give big brother an idea who
created a given document.

Gives the wrong idea - people will do stupid things like use the _id
in lieu of a timestamp or the local_seq for ordering.

Could be better - there's maybe an even better uuid algorithm we could discover.

I think the first case is important, but the others aren't that
compelling. Is there anything I'm missing?


>>> B.
>>> On Mon, Jan 11, 2010 at 12:51 AM, Chris Anderson <> wrote:
>>>> On Sun, Jan 10, 2010 at 4:24 PM, Roger Binns <>
>>>>> Hash: SHA1
>>>>> Chris Anderson wrote:
>>>>>> I'm not feeling super-strong about this. However, making the default
>>>>>> sequential seems like it will preempt a lot of the problems people
>>>>>> tend to show up asking about.
>>>> If we think that speed and size are more important than randomness, we
>>>> should continue to refine uuid generators.
>>>> Roger, if you can make a short sequential that'd be neat.
>>>>> There are several issues conflated together:
>>>>> - - When doing inserts, sorted ids are faster
>>>>> - - The resulting size of the db file is the size of the docs plus a
>>>>> of the _id size (and probably an exponential of the size)
>>>>> - - Sequential ids give small _id
>>>>> - - Random ids give large _id
>>>>> - - Sequentials will clash between different dbs (consider replication,
>>>>> multiple instances etc).  They'll also lead people to rely on this
>>>>> functionality as though it was like a SQL primary key
>>>>> - - Random ids won't clash and better illustrate how CouchDB really works
>>>>>> I think the info-leakage argument is overblown
>>>>> It does make URLs easy to guess like many ecommerce sites that didn't
>>>>> validate when showing you an invoice - you added one to the numeric id
>>>>> the URL and got to see someone elses.
>>>>> I would far prefer the size of the db file and the size of the _id link
>>>>> being addressed.  Because the _id size can cause the db file to get
so big,
>>>>> I/O etc is a lot slower mainly because there is just so much more file
>>>>> deal with!  (In my original posting I had a db go from 21GB to 4GB by
>>>>> reducings ids from 16 bytes to 4 bytes.)
>>>>> Roger
>>>>> -----BEGIN PGP SIGNATURE-----
>>>>> Version: GnuPG v1.4.9 (GNU/Linux)
>>>>> Comment: Using GnuPG with Mozilla -
>>>>> iEYEARECAAYFAktKb8AACgkQmOOfHg372QRb0ACfRWu1TUOs3twwmOGgAUOwhLfx
>>>>> FJkAoKgnkWnPayPtPqMfk3/AxOj2xaMx
>>>>> =V7Zq
>>>>> -----END PGP SIGNATURE-----
>>>> --
>>>> Chris Anderson

Chris Anderson

View raw message