Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com
 designates 74.125.44.30 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=wE022SHXbcVabkK+q1p381sBhwTKEK74uWTwTF+QnHHPOiBNmaBhd3juyzdr3YqyQT
         X7HAWgdjLaohXErh9P6gXocc0k6DqPhu4CnWkp6q3FSVJ9MLTFWGCAhRLsuWeBOqLMpO
         j5SzD2FH/jiVVf8S9tV7m0G1IZ5PHJHpsdpKo=
MIME-Version: 1.0
In-Reply-To: <a891e1bd0905212030g67001a1akd02bf85a6aaa4ad9@mail.gmail.com>
References: <a891e1bd0905210400n34981084t426fb3b55ca18ac6@mail.gmail.com>
	 <067AA5E4-0E5F-46C7-85EE-FC9CBCF99490@apache.org>
	 <a891e1bd0905212030g67001a1akd02bf85a6aaa4ad9@mail.gmail.com>
Date: Fri, 22 May 2009 00:34:19 -0400
Message-ID: <e2111bbb0905212134p2503e841o5cad5637f4b76d31@mail.gmail.com>
Subject: Re: reiterating transactions vs. replication
From: Paul Davis <paul.joseph.davis@gmail.com>
To: dev@couchdb.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

>>
>> 1) Generate foo/1 and bar/1 in an atomic _bulk_docs operation
>> 2) Update foo -> foo/2
>> Compact the DB (foo/1 is deleted)
>> Start replicating to a mirror
>> Replication crashes before it reaches foo/2
>
> By crash you mean an error due to a conflict between foo/2 and foo/1'
> (the mirror's version of foo), right?
>

Pretty sure he means the network link fails (or code fails, etc).

>> In your proposal, we should expect foo/1 to exist on the mirror, right? =
=A0I
>> think this means we'd need to modify the compaction algorithm to keep
>> revisions of documents if a) the revision was part of an atomic _bulk_do=
cs,
>> and b) any of the documents in that transaction are still at the revisio=
n
>> generated by the transaction. =A0Same thing goes for revision stemming -=
- we
>> can never drop revisions if they were part of an atomic upload and at le=
ast
>> one of the document revs in the upload is still current.
>

In general, there were two main ideas that I saw when reading
literature on this subject:

1. Keep some sort of edit history
2. If a replication event occurs, and the target node is too out of
date, then trigger a full database copy.

At one point I suggested something like:

1. Keep some sort of edit history.
    We already do this. And with revision stemming we already have a
configurable "How much history is kept" option. The fancy twist is
that we don't remove old revisions from the update sequence btree
until the revision stemming removes the revision info data. These
revisions are then replicated as part of normal replication.

2. In the case of replicating from a node that's too far out of whack,
instead of a full database copy, we just fall back to our current
replication scheme in that we lose all transaction guarantees (or the
guarantees that we can no longer guarantee, this is all quite hand
wavy).

For point 1, I can see one of a few methods to deal with transactions.
Either we don't commit any of the docs in the transaction until they
all make it across the wire, or we just mark them as a conflict (with
maybe a 'in_transaction' modifier or some such). Keeping track of
revisions is pretty cake because all the documents would be sequential
in the update sequence btree. And it should also be easy to tell when
a transaction is so old that we no longer have all the data necessary
to make it work.

As Yuval describes, the underlying idea would be that you only pay the
cost if you so choose.

On the flip side, this adds a decent amount of complexity to the
replicator and book keeping to other parts of the database.

> Yep. Personally I see this is a tradeoff, not a limitation per se. If
> you specify 'atomic' then you must pay more in terms of data size,
> performance, etc.
>

[snip]

> What concerns me is Damien's post
> (http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c4518=
72B8-152C-42A6-9324-DD52534D9A32@apache.org%3e):
>
>> No, CouchDB replication doesn't support replicating the transactions.
>> Never has, never will. That's more like transaction log replication
>> that's in traditonal dbs, a different beast.
>>
>> For the new bulk transaction model, I'm only proposing supporting
>> eventual consistency. All changes are safe to disk, but the db may not
>> be in a consistent state right away.
>
> From what I know this assumption is wrong. Eventual consistency still
> needs atomic primitives, it's not about whether or not you have
> transactions, it's about what data they affect (eventual consistency
> involves breaking them down).
>

I'm not sure I follow this part. What aspect of eventual consistency
requires atomicity guarantees? CouchDB eventual consistency is like
making dinner plans with a large group of friends. Sometimes different
parts of the network might have a different idea of which restaurant
everyone's meeting at, but assuming everyone remembered to charge
their phones eventually everyone will get to the right place.

> Anyway, "never will" sounds pretty binding, but for the sake of argument:
>

I think he was referring to the heavy log replication stuff that
RDBMS' tend towards. From what I've read these types of approaches
require runtime characteristics that don't fit with the rest of
CouchDB.

If we found a transaction model that worked without hampering the core
design goals of CouchDB then I'm pretty sure everyone would be
extremely enthused about it.

[snip]

>
> However, in another post Damien said:
>
>> Which is why in general you want to avoid inter-document dependencies,
>> or be relaxed in how you deal with them.
>
> So I think I best shut up after this without some decision maker
> telling me not to, if my use case is not covered by the intended
> design then that's that, but I do think this thread sort of covers
> this:
>

Damien's advice is the best idea for most scenarios. It may end up
causing a bit more planning up front for what happens if you have
conflicts and how to take of such things, but as it turns out, once
you have it working, then you have a huge amount of awesome you can
tap into that just isn't available otherwise (without orders of
magnitude more pain, etc, etc).

I won't tell anyone to shut up, especially when they've clearly done
some thinking and have good insight into the problem. I will say that
this particular problem has come up and I have a feeling that there
are more people than just me that are a bit weary from it. I only took
the time to respond this time because you'd made such a reasoned
argument.

Though, the more times I end up writing out long responses to how we
might do replication and the requirements and this and that the more
likely I'll be to just tag any and all replication emails with "will
only discuss working code". Judging from date stamps in that thread,
its been four months and not one person has offered even a
broken-almost-but-not-quite-working patch. In the words of Damien's
blog tagline, "Everybody keeps on talking about it. Nobody's getting
it done".

>> As far as distributed transactions go, I'd be thrilled if we could
>> implement it and also support the rest of couchdb, like views and bi-
>> directional replication. Please start up a discussion here in dev@
>> about it and see if you can work out a design.
>
> Without going too pie-in-the-sky.
>

I think it'd be appropriate to amend that to: If anyone wants this
feature, then start sending code. We're all happy to help introduce
people to the code base if guidance is required, but enough time has
gone by that its hard to seriously consider proposals with no concrete
realization.

> Cheers,
> Yuval
>

HTH,
Paul Davis