couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <>
Subject Re: The replicator needs a superuser mode
Date Wed, 17 Aug 2011 03:15:32 GMT
On Wed, Aug 17, 2011 at 9:49 AM, Adam Kocoloski <> wrote:
> On Aug 16, 2011, at 10:31 PM, Jason Smith wrote:
>> On Tue, Aug 16, 2011 at 9:26 PM, Adam Kocoloski <> wrote:
>>> One of the principal uses of the replicator is to "make this database look like
that one".  We're unable to do that in the general case today because of the combination
of validation functions and out-of-order document transfers.  It's entirely possible for
a document to be saved in the source DB prior to the installation of a ddoc containing a validation
function that would have rejected the document, for the replicator to install the ddoc in
the target DB before replicating the other document, and for the other document to then be
rejected by the target DB.
>> Somebody asked about this on Stack Overflow. It was a very simple but
>> challenging question, but now I can't find it. Basically, he made your
>> point above.
>> Aren't you identifying two problems, though?
>> 1. Sometimes you need to ignore validation to just make a nice, clean copy.
>> 2. Replication batches (an optimization) are disobeying the change
>> sequence, which can screw up the replica.
> As far as I know the only reason one needs to ignore validation to make a nice clean
copy is because the replicator does not guarantee the updates are applied on the target in
the order they were received on the source.  It's all one issue to me.
>> I responded to #1 already.
>> But my feeling about #2 is that the optimization goes too far.
>> replication batches should always have boundaries immediately before
>> and after design documents. In other words, batch all you want, but
>> design documents [1] must always be in a batch size of 1. That will
>> retain the semantics.
>> [1] Actually, the only ddocs needing their own private batches are
>> those with a validate_doc_update field.
> My standard retort to transaction boundaries is that there is no global ordering of events
in a distributed system.  A clustered CouchDB can try to build a vector clock out of the
change sequences of the individual servers and stick to that merged sequence during replication,
but even then the ddoc entry in the feed could be "concurrent" with several other updates.
 I rather like that the replicator aggressively mixes up the ordering of updates because
it prevents us from making choices in the single-server case that aren't sensible in a cluster.

That is interesting. So if it is crucial that an application enforce
transaction semantics, then that application can go ahead and
understand the distribution architecture, and it can confirm that a
ddoc is committed and distributed among all nodes, and then it can
make subsequent changes or replications.

Or, written as a dialogue:

Developer: My application knows or cares that Couch is distributed.
Developer: My application depends on a validation function applying universally.
Developer. But my application won't bother to confirm that it's been
fully pushed before I make changes or replications.
Adam: WTF?

Snark aside, it's an excellent point. Thanks.

Iris Couch

View raw message