couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <>
Subject Re: Unique instance IDs?
Date Mon, 12 Dec 2011 03:21:04 GMT
On Mon, Dec 12, 2011 at 9:52 AM, Paul Davis <> wrote:
> On Sun, Dec 11, 2011 at 7:19 PM, Randall Leeds <> wrote:
>> On Sun, Dec 11, 2011 at 04:00, Alex Besogonov <> wrote:
>>> I wonder, why there are no unique instance IDs in CouchDB? I'm
>>> thinking about 'the central server replicates 2000000 documents to a
>>> million of clients' scenario.
>>> Right now it's not possible to make replication on the 'big central
>>> server' side to be stateless, because the other side tries to write
>>> replication document which is later used to establish common ancestry.
>>> Server can ignore/discard it, but then during the next replication
>>> client would just have to replicate all the changes again. Of course,
>>> the results would be consistent in any case but quite a lot of
>>> additional traffic might be required.
>>> It should be simple to assign each instance a unique ID (computed
>>> using UUID and the set of applied replication filters) and use it to
>>> establish common replication history. It can even be compatible with
>>> the way the current replication system works and basically the only
>>> visible change should be the addition of UUID to database info.
>>> Or am I missing something?
>> I proposed UUIDs for databases a long, long time ago and it's come up
>> a few times since. If the UUID is database-level, then storing it with
>> the database is dangerous -- copying a database file would result in
>> two CouchDB's hosting "the same" (but really different) databases. If
>> the UUID is host-level, then this reduces to a re-invention of DNS. In
>> other words, all DBs should already be uniquely identified by their
>> URLs.
>> Regarding your second paragraph, replicating couches _could_ try to
>> establish common ancestry only by examining a local checkpoint of
>> replication, but the couch replicator looks for the log on both
>> couches to ensure that the database hasn't been deleted+recreated nor
>> has it crashed before certain replicated changes hit disk, as a double
>> check that the sequence numbers have the expected shared meaning.
>> It seems like maybe you're wondering about whether couch could
>> generate snapshot ids that are more meaningful than the sequence
>> number. For a single pair of couches the host-db-seq combo is enough
>> information to replicate effectively. When there's more hosts involved
>> we can talk about more powerful checkpoint ids that would be shareable
>> or resolvable to find common ancestry between more than two
>> replicating hosts to speed up those scenarios. My intuition always
>> says that this leads to hash trees, but I haven't thought about it
>> deeply enough to fully conceive of what this accomplishes or how it
>> would work.
>> -R
> I did have a shimmering of an idea for this awhile back. Basically we
> do both host and db uuid's and the information we use to identifiy
> replications is a hash of the concatenation.
> That way we can copy db's around and not muck with things as well as
> error out a bit. Though this still has a bit of an issue if we copy
> the host uuid around as well. Though we migth be able to look for a
> mac address or something and then fail to boot if the check fails
> (with an optional override if someone changes a nic).

A couch URL is its unique identifier. A database URL is its unique
identifier. This sounds like a too-clever-by-half optimization. IMHO.

Iris Couch

View raw message