couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: svn commit: r897509 - /couchdb/trunk/etc/couchdb/default.ini.tpl.in
Date Mon, 11 Jan 2010 09:39:20 GMT
Roger,

Unless I misread you, you are implying that _id is stored increasingly
less inefficiently in the .couch file as its length increases? I don't
think, unless you've really dug into the disk structure, that this
assertions will hold. A better measure is the number of btree nodes on
disk for the different kinds of _id. I would expect to see more nodes
(and a higher rate of 'turnover') for large random ids than equally as
large sequential ids. Large random ids versus short sequential ids
should show a larger difference but for two unrelated reasons.

B.

On Mon, Jan 11, 2010 at 9:33 AM, Robert Newson <robert.newson@gmail.com> wrote:
> I should point out that the sequential algorithm in couchdb is
> carefully constructed so that generated ids won't clash, even in a
> distributed system. You might have assumed that the sequential ids
> were 1, 2, 3, 4, ... and so on, but they are not.
>
> The sequential ids are the same length as the random ids (16 bytes).
> The first 13 bytes stay the same for around 8000 generated ids and is
> then rerandomized. The ids with the same prefix have suffixes in
> strictly increasing numeric order. This characteristic (that a new id
> is numerically close to the previous id) is what helps with insertion
> speed and general b-tree performance.
>
> Before changing the default I think it would be worth getting numbers
> from a suitably fair benchmark, I would still advocate random as the
> default until that is done.
>
> B.
>
> On Mon, Jan 11, 2010 at 12:51 AM, Chris Anderson <jchris@apache.org> wrote:
>> On Sun, Jan 10, 2010 at 4:24 PM, Roger Binns <rogerb@rogerbinns.com> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Chris Anderson wrote:
>>>> I'm not feeling super-strong about this. However, making the default
>>>> sequential seems like it will preempt a lot of the problems people
>>>> tend to show up asking about.
>>>
>>
>> If we think that speed and size are more important than randomness, we
>> should continue to refine uuid generators.
>>
>> Roger, if you can make a short sequential that'd be neat.
>>
>>> There are several issues conflated together:
>>>
>>> - - When doing inserts, sorted ids are faster
>>>
>>> - - The resulting size of the db file is the size of the docs plus a multiple
>>> of the _id size (and probably an exponential of the size)
>>>
>>> - - Sequential ids give small _id
>>>
>>> - - Random ids give large _id
>>>
>>> - - Sequentials will clash between different dbs (consider replication,
>>> multiple instances etc).  They'll also lead people to rely on this
>>> functionality as though it was like a SQL primary key
>>>
>>> - - Random ids won't clash and better illustrate how CouchDB really works
>>>
>>>> I think the info-leakage argument is overblown
>>>
>>> It does make URLs easy to guess like many ecommerce sites that didn't
>>> validate when showing you an invoice - you added one to the numeric id in
>>> the URL and got to see someone elses.
>>>
>>> I would far prefer the size of the db file and the size of the _id link
>>> being addressed.  Because the _id size can cause the db file to get so big,
>>> I/O etc is a lot slower mainly because there is just so much more file to
>>> deal with!  (In my original posting I had a db go from 21GB to 4GB by
>>> reducings ids from 16 bytes to 4 bytes.)
>>>
>>> Roger
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.9 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iEYEARECAAYFAktKb8AACgkQmOOfHg372QRb0ACfRWu1TUOs3twwmOGgAUOwhLfx
>>> FJkAoKgnkWnPayPtPqMfk3/AxOj2xaMx
>>> =V7Zq
>>> -----END PGP SIGNATURE-----
>>>
>>>
>>
>>
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
>>
>

Mime
View raw message