Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 41824 invoked from network); 7 Feb 2009 00:01:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Feb 2009 00:01:18 -0000 Received: (qmail 86159 invoked by uid 500); 7 Feb 2009 00:01:17 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 86132 invoked by uid 500); 7 Feb 2009 00:01:17 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 86119 invoked by uid 99); 7 Feb 2009 00:01:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Feb 2009 16:01:17 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of antony.blakey@gmail.com designates 209.85.198.228 as permitted sender) Received: from [209.85.198.228] (HELO rv-out-0506.google.com) (209.85.198.228) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Feb 2009 00:01:09 +0000 Received: by rv-out-0506.google.com with SMTP id g37so969858rvb.35 for ; Fri, 06 Feb 2009 16:00:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=dzcGePY1cKB98quMtDEBDGJ1OVk7LOaVFToDpCEYifM=; b=x6hMlskOZv/zyQB0N+3PhXINBqiRej7Bltto8m8NI1tSw5DHf7bl3O6IXdxN20dRhZ DFiupXTJmg1hx8auFHuqprAwHeH24OKhPYvQPlZacdxzvplVgITT3uWt1Fhr9Qbi6FwC K8lV4ibb+YUxFHAaKDPiJOa9Qm12jgHnUSvmU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=Y5ox0wcbVbDodFltjxFwYnoZWLzpg9ETjY5VmTpRSytx7wM05OmUjY3F+PeFk2Q5W2 7rdqtfYFBuhlfoTcLVT/RiKfsADpFzeDppCEflWixTHXyiKPP8yerWOyYUmu6r7LGtxB yOZ6ReA1Kes+htG+gy3wMBdqTvHmgugDDU4Uk= Received: by 10.142.136.16 with SMTP id j16mr818322wfd.184.1233964849246; Fri, 06 Feb 2009 16:00:49 -0800 (PST) Received: from ?192.168.0.16? (ppp121-45-202-232.lns10.adl2.internode.on.net [121.45.202.232]) by mx.google.com with ESMTPS id 20sm4509272wfi.32.2009.02.06.16.00.47 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 06 Feb 2009 16:00:48 -0800 (PST) Message-Id: <23CA35BB-5CD3-4946-969B-8F8DC7A66C29@gmail.com> From: Antony Blakey To: dev@couchdb.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: Transactional _bulk_docs Date: Sat, 7 Feb 2009 10:30:44 +1030 References: <4E507D2E-88F9-4591-B721-F4343ACA9A9E@apache.org> <393666B7-8444-4D23-A2BA-AD59652A96AE@sauria.com> <0D17D25F-7E88-4F19-96A9-62FC81E2DFC5@pobox.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org On 07/02/2009, at 4:47 AM, Jim McCoy wrote: > Returning, as always, to Brewer's CAP conjecture, you can have any two > of consistency, availability, or partition-tolerance. Scalaris went > with paxos sync and selected consistency and partition-tolerance. > This means that if you lose a majority of your Scalaris nodes it > becomes impossible to log a write to the system. In CouchDB > availability and partition-tolerance were prioritized ("eventual > consistency" being that pleasant term we use to play down the fact > that no consistency assurances can actually be made beyond a > best-effort attempt to fix things after the fact.) I think we're talking at cross-purposes here. I'm talking about transactions over a couch cluster, not over an arbitrary set of peers. The proposed mods to the API are to allow the cluster to present the illusion of a single server even if it's implemented over multiple nodes. I have yet to see the definition of multi-node that has prompted this change - it's alternately referred to as multi-node and partitioned, where partitioned means data-partitioning, not node connectivity, but talk about data-partitioning and the fact that cluster-side code is required, gives some clues. What distinguishes such a cluster from an arbitrary group of peers? Well, for one, there is some additional server code/functionality that allows a single entry point to the cluster, and provides the semantics and appearance of a single server. And the rest of the paragraph you refer to is: >> Why compromise the API for something that is so ephemeral that >> conflict management isn't feasible? What IS feasible? View >> consistency? MVCC semantics. If I write a document and then read >> it, do I maybe see an earlier document. What about in the view? >> Because if views are going to have a consistency guarantee wrt. >> writes, then that looks to me like distributed MVCC. Is this not >> 2PC? And if views aren't consistent, then why bother? Why not have >> a client layer that does content-directed load balancing. My point being that when talking about a cluster you probably already have some consistency constraints above and beyond an arbitrary set of peers. So, don't these extra constraints already provide an environment where an ACID operation can be provided? And if there are no extra constraints, then I think such a cluster devolves to something that could be done using a load balancing layer. > I believe the "unbounded number of servers" that Damien was mentioning > was referring to the idea that with multi-node operations the activity > over which you want an ACID commit could become a very large set if > the number of participating nodes grows significantly. This would > mean that a large system would require an ever-growing number of > messages to be passed around, which is a scalability bottleneck. > > It is simply not possible to offer what you want within CouchDB unless > either availability or partition-tolerance are sacrificed. I'm making the presumption, on the basis of this discussion referring to data partitioning, that multi-node Couch clusters are not partition tolerant. As opposed to e.g. p2p replication meshes. It's the difference between these two scenarios, and the way that difference can be taken advantage of within Couch applications, that prompts my comments about the nature of the UI that is presented for local (clustered) vs. distributed operations e.g. >> Regardless, this discussion is also about whether supporting a >> single-node style of operation is useful, because CouchDB had an >> ACID _bulk_docs. IMO it's also about, the danger/cost of >> abstracting over operational models with considerably different >> operational characteristics - c.f. transparent distribution in >> object models. and: >> The use-case of someone doing a local operation e.g. submitting a >> web form, is very different than resolving replication conflicts. >> Conflict during a local operation is a matter of application >> concurrency, whereas conflict during replication is driven by the >> overall system model. It has different temporal, administrative and >> UI boundaries. >> >> In short, I think it is a mistake to try and hide the different >> characteristics of local (even clustered) operations, and >> replication. You may disagree, but if the system distinguishes >> between these two fundamentally different things (distinguished by >> their partition-tolerance), you can code as though every operation >> leads to conflict if you wish, but I can't take advantage of the >> difference if the system doesn't distinguish between those two >> cases. Distinguishing between the two cases allows for a wider >> range of uses and application models. Antony Blakey -------------------------- CTO, Linkuistics Pty Ltd Ph: 0438 840 787 Isn't it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too? -- Douglas Adams