Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 75592 invoked from network); 22 May 2009 04:34:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 May 2009 04:34:38 -0000 Received: (qmail 53644 invoked by uid 500); 22 May 2009 04:34:50 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 53561 invoked by uid 500); 22 May 2009 04:34:50 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 53550 invoked by uid 99); 22 May 2009 04:34:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2009 04:34:50 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=FS_REPLICA,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 74.125.44.30 as permitted sender) Received: from [74.125.44.30] (HELO yx-out-2324.google.com) (74.125.44.30) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2009 04:34:41 +0000 Received: by yx-out-2324.google.com with SMTP id 8so765902yxm.5 for ; Thu, 21 May 2009 21:34:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=kbktFmA8lPq7qjxHN4aTbcLWiEgTl0SkWtGOR5mm5PQ=; b=jG0LusTdXRl1/8S6gXvm9G5fPJk1vDFfiagxbAmrhBQP5Mq0Tv31LX1eg02FrjGBfA AqlHDUzHvbKE08DpDFFlWv3yoEoxrHox4bUP0PN3VvQSQFgGctGXm1O/Fr5RqIY4SC1Y Ql+szPFSjEhJvCrlYpwOgZQGs5FyEZgPQhkTM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=wE022SHXbcVabkK+q1p381sBhwTKEK74uWTwTF+QnHHPOiBNmaBhd3juyzdr3YqyQT X7HAWgdjLaohXErh9P6gXocc0k6DqPhu4CnWkp6q3FSVJ9MLTFWGCAhRLsuWeBOqLMpO j5SzD2FH/jiVVf8S9tV7m0G1IZ5PHJHpsdpKo= MIME-Version: 1.0 Received: by 10.100.131.13 with SMTP id e13mr6532232and.93.1242966859850; Thu, 21 May 2009 21:34:19 -0700 (PDT) In-Reply-To: References: <067AA5E4-0E5F-46C7-85EE-FC9CBCF99490@apache.org> Date: Fri, 22 May 2009 00:34:19 -0400 Message-ID: Subject: Re: reiterating transactions vs. replication From: Paul Davis To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org >> >> 1) Generate foo/1 and bar/1 in an atomic _bulk_docs operation >> 2) Update foo -> foo/2 >> Compact the DB (foo/1 is deleted) >> Start replicating to a mirror >> Replication crashes before it reaches foo/2 > > By crash you mean an error due to a conflict between foo/2 and foo/1' > (the mirror's version of foo), right? > Pretty sure he means the network link fails (or code fails, etc). >> In your proposal, we should expect foo/1 to exist on the mirror, right? = =A0I >> think this means we'd need to modify the compaction algorithm to keep >> revisions of documents if a) the revision was part of an atomic _bulk_do= cs, >> and b) any of the documents in that transaction are still at the revisio= n >> generated by the transaction. =A0Same thing goes for revision stemming -= - we >> can never drop revisions if they were part of an atomic upload and at le= ast >> one of the document revs in the upload is still current. > In general, there were two main ideas that I saw when reading literature on this subject: 1. Keep some sort of edit history 2. If a replication event occurs, and the target node is too out of date, then trigger a full database copy. At one point I suggested something like: 1. Keep some sort of edit history. We already do this. And with revision stemming we already have a configurable "How much history is kept" option. The fancy twist is that we don't remove old revisions from the update sequence btree until the revision stemming removes the revision info data. These revisions are then replicated as part of normal replication. 2. In the case of replicating from a node that's too far out of whack, instead of a full database copy, we just fall back to our current replication scheme in that we lose all transaction guarantees (or the guarantees that we can no longer guarantee, this is all quite hand wavy). For point 1, I can see one of a few methods to deal with transactions. Either we don't commit any of the docs in the transaction until they all make it across the wire, or we just mark them as a conflict (with maybe a 'in_transaction' modifier or some such). Keeping track of revisions is pretty cake because all the documents would be sequential in the update sequence btree. And it should also be easy to tell when a transaction is so old that we no longer have all the data necessary to make it work. As Yuval describes, the underlying idea would be that you only pay the cost if you so choose. On the flip side, this adds a decent amount of complexity to the replicator and book keeping to other parts of the database. > Yep. Personally I see this is a tradeoff, not a limitation per se. If > you specify 'atomic' then you must pay more in terms of data size, > performance, etc. > [snip] > What concerns me is Damien's post > (http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c4518= 72B8-152C-42A6-9324-DD52534D9A32@apache.org%3e): > >> No, CouchDB replication doesn't support replicating the transactions. >> Never has, never will. That's more like transaction log replication >> that's in traditonal dbs, a different beast. >> >> For the new bulk transaction model, I'm only proposing supporting >> eventual consistency. All changes are safe to disk, but the db may not >> be in a consistent state right away. > > From what I know this assumption is wrong. Eventual consistency still > needs atomic primitives, it's not about whether or not you have > transactions, it's about what data they affect (eventual consistency > involves breaking them down). > I'm not sure I follow this part. What aspect of eventual consistency requires atomicity guarantees? CouchDB eventual consistency is like making dinner plans with a large group of friends. Sometimes different parts of the network might have a different idea of which restaurant everyone's meeting at, but assuming everyone remembered to charge their phones eventually everyone will get to the right place. > Anyway, "never will" sounds pretty binding, but for the sake of argument: > I think he was referring to the heavy log replication stuff that RDBMS' tend towards. From what I've read these types of approaches require runtime characteristics that don't fit with the rest of CouchDB. If we found a transaction model that worked without hampering the core design goals of CouchDB then I'm pretty sure everyone would be extremely enthused about it. [snip] > > However, in another post Damien said: > >> Which is why in general you want to avoid inter-document dependencies, >> or be relaxed in how you deal with them. > > So I think I best shut up after this without some decision maker > telling me not to, if my use case is not covered by the intended > design then that's that, but I do think this thread sort of covers > this: > Damien's advice is the best idea for most scenarios. It may end up causing a bit more planning up front for what happens if you have conflicts and how to take of such things, but as it turns out, once you have it working, then you have a huge amount of awesome you can tap into that just isn't available otherwise (without orders of magnitude more pain, etc, etc). I won't tell anyone to shut up, especially when they've clearly done some thinking and have good insight into the problem. I will say that this particular problem has come up and I have a feeling that there are more people than just me that are a bit weary from it. I only took the time to respond this time because you'd made such a reasoned argument. Though, the more times I end up writing out long responses to how we might do replication and the requirements and this and that the more likely I'll be to just tag any and all replication emails with "will only discuss working code". Judging from date stamps in that thread, its been four months and not one person has offered even a broken-almost-but-not-quite-working patch. In the words of Damien's blog tagline, "Everybody keeps on talking about it. Nobody's getting it done". >> As far as distributed transactions go, I'd be thrilled if we could >> implement it and also support the rest of couchdb, like views and bi- >> directional replication. Please start up a discussion here in dev@ >> about it and see if you can work out a design. > > Without going too pie-in-the-sky. > I think it'd be appropriate to amend that to: If anyone wants this feature, then start sending code. We're all happy to help introduce people to the code base if guidance is required, but enough time has gone by that its hard to seriously consider proposals with no concrete realization. > Cheers, > Yuval > HTH, Paul Davis