couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Besogonov <alex.besogo...@gmail.com>
Subject Re: Unique instance IDs?
Date Mon, 12 Dec 2011 05:29:58 GMT
On Sun, Dec 11, 2011 at 8:19 PM, Randall Leeds <randall.leeds@gmail.com> wrote:
> I proposed UUIDs for databases a long, long time ago and it's come up
> a few times since. If the UUID is database-level, then storing it with
> the database is dangerous -- copying a database file would result in
> two CouchDB's hosting "the same" (but really different) databases. If
> the UUID is host-level, then this reduces to a re-invention of DNS. In
> other words, all DBs should already be uniquely identified by their
> URLs.
Do people really copy databases? In this case a UUID for DB instance
and UUID for the host should do fine. Host UUIDs can be generated
during couchdb installation, it should be the easiest way.

There's no good way to uniquely identify hosts, unfortunately (or
fortunately). MAC addresses are not reliable and the set of network
interfaces can change rapidly.

And URLs are definitely out of the question - I'm thinking to use my
replicator in home devices that might have duplicate host names with
IP addresses assigned by DHCP.

> Regarding your second paragraph, replicating couches _could_ try to
> establish common ancestry only by examining a local checkpoint of
> replication, but the couch replicator looks for the log on both
> couches to ensure that the database hasn't been deleted+recreated nor
> has it crashed before certain replicated changes hit disk, as a double
> check that the sequence numbers have the expected shared meaning.
Yes, I guessed that's what ensure_full_commit is used for.

> It seems like maybe you're wondering about whether couch could
> generate snapshot ids that are more meaningful than the sequence
> number. For a single pair of couches the host-db-seq combo is enough
> information to replicate effectively. When there's more hosts involved
> we can talk about more powerful checkpoint ids that would be shareable
> or resolvable to find common ancestry between more than two
> replicating hosts to speed up those scenarios. My intuition always
> says that this leads to hash trees, but I haven't thought about it
> deeply enough to fully conceive of what this accomplishes or how it
> would work.
Hash trees are definitely interesting, especially since I really want to
have deterministic IDs for revisions. But their overhead is something to
be considered. Right now I'm at 150000 document insertions/sec for
non-bulk updates and I really like the speed.

Mime
View raw message