couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick North (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1373) Time-order​ed document ids including the database identity
Date Fri, 13 Jan 2012 16:38:41 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185661#comment-13185661
] 

Nick North commented on COUCHDB-1373:
-------------------------------------

The Flake algorithm is interesting. I also started with Twitter's algorithm and, at the time,
shared Twitter's desire to fit ids into a 64-bit long - that affected my choice of suffix.
I'm also not an experienced Erlang programmer, so was heavily influenced by the existing CouchDb
utc_random algorithm as it could be adapted with minimal changes.

I think Flake is overly complex, because they use a millisecond clock plus a separate sequence
id. This is perhaps because they want an algorithm that can be implemented in any language.
In Erlang we can use the full microsecond precision of the clock and rely on the fact that
repeated calls to Now are guaranteed to be monotonic increasing. However, Flake's use of the
MAC address as a machine identifier saves on config file entries, and might be adopted.

Left to my own devices I would stick with the current proposal, but a more Flake-like one
would suit me as well. But if I had to implement it, I'm not sure how long it would take with
my limited Erlang knowledge and dev environment. 
                
> Time-order​ed document ids including the database identity
> ----------------------------------------------------------
>
>                 Key: COUCHDB-1373
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1373
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Nick North
>            Priority: Minor
>              Labels: uuid
>         Attachments: 0001-Add-etap-for-jira-1373.patch, 0002-utc_id_suffix.patch, 0003-utc_id_suffix.patch,
couch_uuids.patch, utc_machine_id.patch
>
>
> This suggestion is for an enhancement to the document id generation algorithms in CouchDb.
I am new to CouchDb, and this question addresses an old issue (https://issues.apache.org/jira/browse/COUCHDB-465)
so please forgive me if I am retreading old ground.
> My application has a number of mutually replicating CouchDb instances and I would like
document ids to be monotonically-increasing per-instance, and globally unique, and for the
instance where the document was created to be determinable from the id. (To be more accurate
- I don't need to know anything about the instance itself; just whether any two documents
originated from the same instance.) The utc_random algorithm is not far from meeting these
requirements, as ids are monotonic and almost certainly globally unique. However, the instance
cannot be determined from the id, and there is a tiny chance of an id clash between two instances.
Both of these issues could be solved if the random part of the id could be replaced with a
suffix that is fixed in the ini file for each instance.
> To address this I have a modified version of couch_uuids.erl introducing a new utc_machine_id
algorithm which reads a machine_id string from the ini file and then generates ids using an
internal utc_suffix method that just appends the string to the usual utc 14-byte string. utc_random
then also uses the utc_suffix method, but its suffix is the usual random byte string.
> However, it is obviously a nuisance to have to maintain a non-standard distribution,
so I wondered if there is enough call for this sort of thing to make it a part of the standard
distribution? If there is, I'd be very happy to make my code available for discussion/modification/inclusion.
If there are good reasons why this is a bad idea, then I'd also be very interested to hear
them so that I can rethink my ideas. (It happens that the privacy and guessability concerns
raised in the original discussion do not apply in my case.) If this question has been beaten
to death, then I'm sorry for bothering the group, and would be grateful if someone could point
me to the discussions so that I can understand the issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message