Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 35223 invoked from network); 22 May 2009 18:30:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 May 2009 18:30:47 -0000 Received: (qmail 75344 invoked by uid 500); 22 May 2009 18:30:59 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 75272 invoked by uid 500); 22 May 2009 18:30:59 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 75262 invoked by uid 99); 22 May 2009 18:30:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2009 18:30:59 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=FS_REPLICA,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of randall.leeds@gmail.com designates 209.85.219.168 as permitted sender) Received: from [209.85.219.168] (HELO mail-ew0-f168.google.com) (209.85.219.168) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2009 18:30:50 +0000 Received: by ewy12 with SMTP id 12so2342536ewy.11 for ; Fri, 22 May 2009 11:30:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=J/N+nuSpaSmFiCXTJCU4KOj7h+bCNYwUmtcnztPryIg=; b=QDfGWf8uj/Grz7pMhBwDmgcERuvK8Ui+lR7HP7OZpp+zGx+AZKZ7yZaCZ/oCEfbARE lUAj46LdV+oUM0++10gs3trQ4P/RuD3dq27nAizbJ5GOGRY6MDMAZf/elISPqFDmq10o +Al+EKzCUS2b1ekm9380zB7489xzNXX/i5cZ4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=nNsZwEgGKVafvUy6MWrzNAjo277c6hlQpQaQxzB48KKC8K0KWukLffD3N6rCTdd9/e WM0Cuykxyj4OwlB1qgLKNtZH7E2i+NJ94HMFygOiDrDk1lBrc85AIIc4Vfgv1a9fS4py 1bkWOsUREjTk4KfcwjZLLyPezALtAQwsUyc3M= MIME-Version: 1.0 Received: by 10.216.36.80 with SMTP id v58mr833213wea.193.1243017029837; Fri, 22 May 2009 11:30:29 -0700 (PDT) In-Reply-To: References: <067AA5E4-0E5F-46C7-85EE-FC9CBCF99490@apache.org> Date: Fri, 22 May 2009 14:30:29 -0400 Message-ID: Subject: Re: reiterating transactions vs. replication From: Randall Leeds To: dev@couchdb.apache.org Content-Type: multipart/alternative; boundary=0016367b6af2836530046a847419 X-Virus-Checked: Checked by ClamAV on apache.org --0016367b6af2836530046a847419 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On Fri, May 22, 2009 at 12:27, Randall Leeds wrote: > > > Since I do like the component model, I'm planning to set up a github > project to play with some consensus protocols and overlay networks in > Erlang. Hopefully once I start doing that I'll start to see the places that > CouchDB can hook into it and get a nice, clean, flexible API. I see the > problem broken into several tiers. > > Transactional Bulk Docs (this is the wishlist and challenge, but has to > rest on the below) > Sharding/Replication (_seq consensus / possibly consistent hashing or other > distributed, deterministic data structure mapping BTree nodes to servers > [2]) > Communication (either Erlang or a tcp with pluggable overlay-network for > routing) > A revised break-down should be something like: Transactional Bulk-Docs Single-Doc Multi-Replica Transactions Replication / Sharding Network Example: Transactional Bulk-Docs (Server pre-prepares itself as leader for a special bulk round) Single-Doc Multi-Replica Transactions (Simple consensus. Special leader for bulk case. Pre-determined leader normally.) Replication / Sharding (Any sort of load-balancing, slicing, or static configuration) Network (Chord and derivatives (Scalaris uses Chord #), Tapestry, Pastry, etc) I think with the right configurations and components transactional bulk-docs are just a special case of single-doc transactions. For example, in case the single-doc layer optimizes for less communication rounds by pre-selecting leaders on a rotating basis a bulk transaction just involves revoking all nodes for a sequence number consensus round and using an extra round trip to "take over" the leader position. Then all nodes holding replicas of all documents involved would have to participate in this new round (or at least a majority of replicas). Having 'atomic=false' could skip this expense and make a best-effort serial execution of the updates and fail on conflict. Just trying to keep the conversation rolling. But I understand we have to hit the code soon if this really stands to go somewhere. --0016367b6af2836530046a847419--