couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick North <nort...@gmail.com>
Subject Time-ordered document ids including the database identity
Date Thu, 29 Dec 2011 13:34:51 GMT
This suggestion is for an enhancement to the document id generation
algorithms in CouchDb. I am new to CouchDb, and this question addresses an
old issue (https://issues.apache.org/jira/browse/COUCHDB-465) so please
forgive me if I am retreading old ground.

My application has a number of mutually replicating CouchDb instances and I
would like document ids to be monotonically-increasing per-instance, and
globally unique, and for the instance where the document was created to be
determinable from the id. (To be more accurate - I don't need to know
anything about the instance itself; just whether any two documents
originated from the same instance.) The utc_random algorithm is not far
from meeting these requirements, as ids are monotonic and almost certainly
globally unique. However, the instance cannot be determined from the id,
and there is a tiny chance of an id clash between two instances. Both of
these issues could be solved if the random part of the id could be replaced
with a suffix that is fixed in the ini file for each instance.

To addresses this I have a modified version of couch_uuids.erl introducing
a new utc_machine_id algorithm which reads a machine_id string from the ini
file and then generates ids using an internal utc_suffix method that just
appends the string to the usual utc 14-byte string. utc_random then also
uses the utc_suffix method, but its suffix is the usual random byte string.

However, it is obviously a nuisance to have to maintain a non-standard
distribution, so I wondered if there is enough call for this sort of thing
to make it a part of the standard distribution? If there is, I'd be very
happy to make my code available for discussion/modification/inclusion. If
there are good reasons why this is a bad idea, then I'd also be very
interested to hear them so that I can rethink my ideas. (It happens that
the privacy and guessability concerns raised in the original discussion do
not apply in my case.) If this question has been beaten to death, then I'm
sorry for bothering the list, and would be grateful if someone could point
me to the discussions so that I can understand the issues. Many thanks,

Nick North

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message