couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: Unique instance IDs?
Date Mon, 12 Dec 2011 01:19:27 GMT
On Sun, Dec 11, 2011 at 04:00, Alex Besogonov <alex.besogonov@gmail.com> wrote:
> I wonder, why there are no unique instance IDs in CouchDB? I'm
> thinking about 'the central server replicates 2000000 documents to a
> million of clients' scenario.
>
> Right now it's not possible to make replication on the 'big central
> server' side to be stateless, because the other side tries to write
> replication document which is later used to establish common ancestry.
> Server can ignore/discard it, but then during the next replication
> client would just have to replicate all the changes again. Of course,
> the results would be consistent in any case but quite a lot of
> additional traffic might be required.
>
> It should be simple to assign each instance a unique ID (computed
> using UUID and the set of applied replication filters) and use it to
> establish common replication history. It can even be compatible with
> the way the current replication system works and basically the only
> visible change should be the addition of UUID to database info.
>
> Or am I missing something?

I proposed UUIDs for databases a long, long time ago and it's come up
a few times since. If the UUID is database-level, then storing it with
the database is dangerous -- copying a database file would result in
two CouchDB's hosting "the same" (but really different) databases. If
the UUID is host-level, then this reduces to a re-invention of DNS. In
other words, all DBs should already be uniquely identified by their
URLs.

Regarding your second paragraph, replicating couches _could_ try to
establish common ancestry only by examining a local checkpoint of
replication, but the couch replicator looks for the log on both
couches to ensure that the database hasn't been deleted+recreated nor
has it crashed before certain replicated changes hit disk, as a double
check that the sequence numbers have the expected shared meaning.

It seems like maybe you're wondering about whether couch could
generate snapshot ids that are more meaningful than the sequence
number. For a single pair of couches the host-db-seq combo is enough
information to replicate effectively. When there's more hosts involved
we can talk about more powerful checkpoint ids that would be shareable
or resolvable to find common ancestry between more than two
replicating hosts to speed up those scenarios. My intuition always
says that this leads to hash trees, but I haven't thought about it
deeply enough to fully conceive of what this accomplishes or how it
would work.

-R

Mime
View raw message