Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of antony.blakey@gmail.com
 designates 209.85.198.228 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:from:to:in-reply-to:content-type
         :content-transfer-encoding:mime-version:subject:date:references
         :x-mailer;
        b=Y5ox0wcbVbDodFltjxFwYnoZWLzpg9ETjY5VmTpRSytx7wM05OmUjY3F+PeFk2Q5W2
         7rdqtfYFBuhlfoTcLVT/RiKfsADpFzeDppCEflWixTHXyiKPP8yerWOyYUmu6r7LGtxB
         yOZ6ReA1Kes+htG+gy3wMBdqTvHmgugDDU4Uk=
Message-Id: <23CA35BB-5CD3-4946-969B-8F8DC7A66C29@gmail.com>
From: Antony Blakey <antony.blakey@gmail.com>
To: dev@couchdb.apache.org
In-Reply-To: <f01e36880902061017y148c669by31a05eead4c210a2@mail.gmail.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Subject: Re: Transactional _bulk_docs
Date: Sat, 7 Feb 2009 10:30:44 +1030
References: <BC4993A1-5DE9-43B6-AA57-4D2362EA72CE@gmail.com>
 <4E507D2E-88F9-4591-B721-F4343ACA9A9E@apache.org>
 <393666B7-8444-4D23-A2BA-AD59652A96AE@sauria.com>
 <BF0132C0-ACC7-4A5C-BE68-4E41AE20373C@apache.org>
 <0D17D25F-7E88-4F19-96A9-62FC81E2DFC5@pobox.com>
 <D89DB717-DAF3-4120-A36B-85EE489A0388@apache.org>
 <e282921e0902051150g2aeda88fr2789f9f19468b30c@mail.gmail.com>
 <AD2B7889-803E-4ABF-AA23-3DBA9337A834@gmail.com>
 <e2111bbb0902052113p54b5bd8tc8610a20be3b610e@mail.gmail.com>
 <D90811EC-ED32-4C7A-9E77-9A0408CCCC30@gmail.com>
 <f01e36880902061017y148c669by31a05eead4c210a2@mail.gmail.com>


On 07/02/2009, at 4:47 AM, Jim McCoy wrote:

> Returning, as always, to Brewer's CAP conjecture, you can have any two
> of consistency, availability, or partition-tolerance.  Scalaris went
> with paxos sync and selected consistency and partition-tolerance.
> This means that if you lose a majority of your Scalaris nodes it
> becomes impossible to log a write to the system. In CouchDB
> availability and partition-tolerance were prioritized ("eventual
> consistency" being that pleasant term we use to play down the fact
> that no consistency assurances can actually be made beyond a
> best-effort attempt to fix things after the fact.)

I think we're talking at cross-purposes here. I'm talking about  
transactions over a couch cluster, not over an arbitrary set of peers.  
The proposed mods to the API are to allow the cluster to present the  
illusion of a single server even if it's implemented over multiple  
nodes. I have yet to see the definition of multi-node that has  
prompted this change - it's alternately referred to as multi-node and  
partitioned, where partitioned means data-partitioning, not node  
connectivity, but talk about data-partitioning and the fact that  
cluster-side code is required, gives some clues.

What distinguishes such a cluster from an arbitrary group of peers?  
Well, for one, there is some additional server code/functionality that  
allows a single entry point to the cluster, and provides the semantics  
and appearance of a single server. And the rest of the paragraph you  
refer to is:

>> Why compromise the API for something that is so ephemeral that  
>> conflict management isn't feasible? What IS feasible? View  
>> consistency? MVCC semantics. If I write a document and then read  
>> it, do I maybe see an earlier document. What about in the view?  
>> Because if views are going to have a consistency guarantee wrt.  
>> writes, then that looks to me like distributed MVCC. Is this not  
>> 2PC? And if views aren't consistent, then why bother? Why not have  
>> a client layer that does content-directed load balancing.

My point being that when talking about a cluster you probably already  
have some consistency constraints above and beyond an arbitrary set of  
peers. So, don't these extra constraints already provide an  
environment where an ACID operation can be provided? And if there are  
no extra constraints, then I think such a cluster devolves to  
something that could be done using a load balancing layer.

> I believe the "unbounded number of servers" that Damien was mentioning
> was referring to the idea that with multi-node operations the activity
> over which you want an ACID commit could become a very large set if
> the number of participating nodes grows significantly.  This would
> mean that a large system would require an ever-growing number of
> messages to be passed around, which is a scalability bottleneck.
>
> It is simply not possible to offer what you want within CouchDB unless
> either availability or partition-tolerance are sacrificed.

I'm making the presumption, on the basis of this discussion referring  
to data partitioning, that multi-node Couch clusters are not partition  
tolerant. As opposed to e.g. p2p replication meshes. It's the  
difference between these two scenarios, and the way that difference  
can be taken advantage of within Couch applications, that prompts my  
comments about the nature of the UI that is presented for local  
(clustered) vs. distributed operations e.g.

>> Regardless, this discussion is also about whether supporting a  
>> single-node style of operation is useful, because CouchDB had an  
>> ACID _bulk_docs. IMO it's also about, the danger/cost of  
>> abstracting over operational models with considerably different  
>> operational characteristics - c.f. transparent distribution in  
>> object models.

and:

>> The use-case of someone doing a local operation e.g. submitting a  
>> web form, is very different than resolving replication conflicts.  
>> Conflict during a local operation is a matter of application  
>> concurrency, whereas conflict during replication is driven by the  
>> overall system model. It has different temporal, administrative and  
>> UI boundaries.
>>
>> In short, I think it is a mistake to try and hide the different  
>> characteristics of local (even clustered) operations, and  
>> replication. You may disagree, but if the system distinguishes  
>> between these two fundamentally different things (distinguished by  
>> their partition-tolerance), you can code as though every operation  
>> leads to conflict if you wish, but I can't take advantage of the  
>> difference if the system doesn't distinguish between those two  
>> cases. Distinguishing between the two cases allows for a wider  
>> range of uses and application models.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Isn't it enough to see that a garden is beautiful without having to  
believe that there are fairies at the bottom of it too?
   -- Douglas Adams