couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: svn commit: r897509 - /couchdb/trunk/etc/couchdb/default.ini.tpl.in
Date Mon, 11 Jan 2010 20:51:29 GMT

On Jan 11, 2010, at 11:21 AM, Paul Davis wrote:

>> My experience is that from time to time someone has a support request
>> where the symptom is "CouchDB is so slow as to be unusable" and the
>> answer is "set sequential uuids" and they are happen and CouchDB
>> "works" again.
>> 
>> Support requests are like cockroaches, for everyone you see there 100
>> others you don't. This math means the default random uuids is one of
>> the bigger bugs CouchDB ships with, and the switch to sequential is
>> one of the smallest patches with the biggest positive impacts we could
>> make.
> 
> Well I wouldn't characterize random UUID's as a bug, but yes they
> happen to exacerbate the worse side of the b~tree performance. Though
> I don't think that speed alone is reason enough to change the default.
> 
>> The downsides to sequential uuids are these (unless I've missed one).
>> 
>> Info leakage - the sequential uuids could give big brother an idea who
>> created a given document.
>> 
>> Gives the wrong idea - people will do stupid things like use the _id
>> in lieu of a timestamp or the local_seq for ordering.
>> 
>> Could be better - there's maybe an even better uuid algorithm we could discover.
>> 
>> I think the first case is important, but the others aren't that
>> compelling. Is there anything I'm missing?
> 
> My biggest concern is that it gives a relative ordering and proximity
> information to documents created on a given node (and can spread
> between DB's). And its a non-obvious leakage so that people may not
> realize that they're leaking such information. It may seem like an
> abstract concern but I think its real enough to force users to make
> that decision.

I was the one who asked Chris to make the change. The current ids are the worst case for btree
insert performance, slowing and bloating both doc inserts and view indexing

I don't see leakage as a problem. I don't think we've ever claimed as a feature that our generated
id are somehow secure against someone figuring out when and where something might have been
created, and I don't know of anyone relying on it.

But I agree we should add to the documentation how ids are generated its implications. If
someone wants crypto random ids, they can configure it.

-Damien


> 
> The sequential algorithm isn't time based, so its misuse doesn't
> really play into effect nearly as much as if we were going to try the
> utc_random algorithm.
> 
> HTH,
> Paul Davis


Mime
View raw message