Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 5403 invoked from network); 26 Oct 2009 15:58:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Oct 2009 15:58:26 -0000 Received: (qmail 84318 invoked by uid 500); 26 Oct 2009 15:58:25 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 84284 invoked by uid 500); 26 Oct 2009 15:58:25 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 84274 invoked by uid 99); 26 Oct 2009 15:58:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Oct 2009 15:58:25 +0000 X-ASF-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,FS_REPLICA X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.217.211 as permitted sender) Received: from [209.85.217.211] (HELO mail-gx0-f211.google.com) (209.85.217.211) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Oct 2009 15:58:21 +0000 Received: by gxk3 with SMTP id 3so5823310gxk.15 for ; Mon, 26 Oct 2009 08:58:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=/COLoEfywZrLXRvYujUyutMJMjJAdp5kcY6H5ag5rzk=; b=iVjkKxQx7KMSiHzpq24dvdgd22rimBbs9Wo9vixl7zD6/LjmuVyCQhOaUTkdbVDyFt CdxWr7ca2bTGDvHNglcNBKo2OplUOB46fHW5R4rLayVc8pYu2S8cP0uB8oeK4bIOlzNw kCX+Rh6hjr1Rm3VAIKrvRPSlJuHNUcSTnPGUY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=lO5ANYU3g2UD40XZDi33VzsFjpv+W4HX9Crgegkx7xVfASsVMbN1WxugtfonCLi5OK bNApHDckyoScoqO3YNKG5NCjFhAZHXM5Zf3tu3SZAQZ/3arKF7wDNeCqGZA8fx77ZQtU sm+f2h9lgjIQUdYqp85+Ng0piafYXPbQgAKIA= MIME-Version: 1.0 Received: by 10.101.87.7 with SMTP id p7mr9516095anl.59.1256572679512; Mon, 26 Oct 2009 08:57:59 -0700 (PDT) In-Reply-To: <4AE5C134.7000108@meetinghouse.net> References: <4AE2028F.9030502@meetinghouse.net> <0C6A665F-53C0-41ED-9A8E-C00098B45174@apache.org> <4AE20CD6.7010500@meetinghouse.net> <4AE2247B.1010803@meetinghouse.net> <4AE5B610.90504@meetinghouse.net> <4AE5C134.7000108@meetinghouse.net> From: Paul Davis Date: Mon, 26 Oct 2009 11:57:39 -0400 Message-ID: Subject: Re: massive replication? To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Miles, One the one hand, it sounds like you could solve this with XMPP and use CouchDB as a backing store. People have already connected RabbitMQ and CouchDB, I can't imagine that connecting ejabberd and CouchDB would be much harder. The pubsub extensions could very much be what you're wanting. Though, that doesn't let you write awesome distributed routing protocols. So, really, what fun could that be? I'm basically thinking something along the lines of: 1. Initial peer discovery is seeded or auto discovered from multicast messages. Or some combination thereof depending on your network topology. 2. Nodes gossip information they've found through replication with nodes they know about. Something like, create a database that contains network connections and replicate that along side with your data. A view on this database would calculate each subscription end point's highest known update_seq and who has it. 3. Each subscription end point is a database that can be replicated. There are at least two hard parts I can spot right now: 1. Replication currently has no endpoint for detection. We don't expose a list of _local/docs to anything so detecting when a node is replicating too us would be harder. Adding a custom patch to alert a db_update_notification process with _local/doc updates would be fairly trivial. 2. The fun part of replication strategy in trying to maximize network throughput, avoid stampeding a single updater, etc etc. There's enough literature on this to get a general idea of where to go, but it'd still probably take some tuning and experimentation. But with plenty of monitoring and some foresight, giving your self the ability to tweak system parameters and measure the effects would be quite a lot of fun. ... Holy crap, I'm contemplating writing the basic routing logic in Python/Ruby and replicating it out to the network, then as each node gets a copy the router replaces its logic dynamically. Now that'd be just plain awesome. Note that this sort of protocol doesn't attempt to avoid bandwidth usage by multi-casting documents which may be a non-starter depending on your topology and characteristics. Instead the idea is to figure out how to play with your network topology to react and match your physical link network yet with enough redundancy that it doesn't break as physical links come and go. Granted, that's all off the top of my head, so until I sit down and implement that on a couple hundred nodes it may well be a bunch of hot air. :) Paul Davis On Mon, Oct 26, 2009 at 11:33 AM, Miles Fidelman wrote: > Adam Kocoloski wrote: >> >> On Oct 26, 2009, at 10:45 AM, Miles Fidelman wrote: >> >>> The environment we're looking at is more of a mesh where connectivity i= s >>> coming up and down - think mobile ad hoc networks. >>> >>> I like the idea of a replication bus, perhaps using something like spre= ad >>> (http://www.spread.org/) or spines (www.spines.org) as a multi-cast fab= ric. >>> >>> I'm thinking something like continuous replication - but where the >>> updates are pushed to a multi-cast port rather than to a specific node,= with >>> each node subscribing to update feeds. >>> >>> Anybody have any thoughts on how that would play with the current >>> replication and conflict resolution schemes? >>> >>> Miles Fidelman >> >> Hi Miles, this sounds like really cool stuff. =A0Caveat: I have no >> experience using Spread/Spines and very little experience with IP >> multicasting, which I guess is what those tools try to reproduce in >> internet-like environments. =A0So bear with me if I ask stupid questions= . >> >> 1) Would the CouchDB servers be responsible for error detection and >> correction? =A0I imagine that complicates matters considerably, but it >> wouldn't be impossible. > > Good question. =A0I hadn't quite thought that far ahead. =A0I think the b= asic > answer is no (assume reliable multicast), but... some kind of healing > mechanism would probably be required (see below). >> >> 2) When these CouchDB servers drop off for an extended period and then >> rejoin, how do they subscribe to the update feed from the replication bu= s at >> a particular sequence? =A0This is really the key element of the setup. = =A0When I >> think of multicasting I think of video feeds and such, where if you drop= off >> and rejoin you don't care about the old stuff you missed. =A0That's not = the >> case here. =A0Does the bus store all this old feed data? > > Think of something like RSS, but with distributed infrastructure. > A node would publish an update to a specific address (e.g., like publishi= ng > an RSS feed). > > All nodes would subscribe to the feed, and receive new messages in sequen= ce. > =A0When picking up updates, you ask for everything after a particular seq= uence > number. =A0The update service maintains the data. >> >> 3) Which steps of the replication do you envision using the replication >> bus? =A0Just the _changes feed (essentially a list of docid:rev pairs) o= r the >> actual documents themselves? > > Any change to a local copy of the database (i.e., everything). >> >> The conflict resolution model shouldn't care about whether replication i= s >> p2p or uses this bus. =A0Best, > > Thanks, > > Miles > > > -- > In theory, there is no difference between theory and practice. > In practice, there is. =A0 .... Yogi Berra > > >