Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 56129 invoked from network); 5 Mar 2009 13:01:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Mar 2009 13:01:26 -0000 Received: (qmail 4706 invoked by uid 500); 5 Mar 2009 13:01:25 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 4484 invoked by uid 500); 5 Mar 2009 13:01:24 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 4473 invoked by uid 99); 5 Mar 2009 13:01:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2009 05:01:24 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=FS_REPLICA,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adam.kocoloski@gmail.com designates 209.85.132.251 as permitted sender) Received: from [209.85.132.251] (HELO an-out-0708.google.com) (209.85.132.251) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2009 13:01:16 +0000 Received: by an-out-0708.google.com with SMTP id c37so1988522anc.5 for ; Thu, 05 Mar 2009 05:00:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=Gq7GCH5BHeorVCc3PI1YHYPUpkb25dfJzcCRkvG3omQ=; b=bsLlnJEqu68tZ8NBQ2aSzEh5/9jsB5ZWYJZjC2Vch82ZD0MMmOYyd0rp3S1ZItpcD/ VxKcQzXq2rCQiNwXt95xGeI56vU46LDfzzoFX+DX7AZ16QEKtjPPc1LGaDqL9MhVm8SC C18TbwKWrTQMoZALz0mTFmK9F/D6p6QHriYI4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=aCXkTMOzBrQj5iVi63KaiLuN69nsOiX/TRidtppRFathdhclBUwwfOFQhruHD9lwqn Ql9aPDyUuCJ3HCy8MLxyy7s4Wgr5TUymTg3ni8Z8MhMDZVrN+9Ow/hoVBK32VxUxq6Iy 9rSLxjE86ROLMxx+IbTeT3HlHdgFJWwQepZjg= Received: by 10.100.133.1 with SMTP id g1mr858007and.4.1236258055477; Thu, 05 Mar 2009 05:00:55 -0800 (PST) Received: from ?10.0.1.4? (c-24-63-141-105.hsd1.ma.comcast.net [24.63.141.105]) by mx.google.com with ESMTPS id b37sm5582949ana.57.2009.03.05.05.00.54 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 05 Mar 2009 05:00:54 -0800 (PST) Message-Id: From: Adam Kocoloski To: dev@couchdb.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: handling simultaneous identical replications Date: Thu, 5 Mar 2009 08:00:52 -0500 References: <3299B0C3-3038-4B6F-8B2F-B40885A7E8F4@gmail.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org On Mar 5, 2009, at 7:24 AM, Jan Lehnardt wrote: > > On 5 Mar 2009, at 07:31, Paul Davis wrote: > >> On Wed, Mar 4, 2009 at 8:34 PM, Adam Kocoloski > > wrote: >>> Hi folks, we've been running into a problem where multiple >>> replications with >>> the same source and target are running simultaneously. This >>> introduces >>> quite a lot of unnecessary network traffic and causes real >>> problems with >>> update collisions on the local replication history documents. If >>> replicator >>> A updates the source doc and replicator B updates the target doc, >>> subsequent >>> replications will decide that a full replication is necessary. >>> >>> I have some ideas about how to ensure only one is running at a >>> time (more on >>> that in a separate mail), but I'd like some feedback on how to >>> handle the >>> second..Nth request. Let's call the initial POST to _replicate >>> "A" and the >>> second POST "B": >>> >>> Option 1 -- Respond to B with the results from A >>> This option works fine if the source is remote. However, if the >>> source is >>> local, the replication started by A will be missing updates to the >>> source DB >>> that occurred between A and B. B may be surprised by that result. >>> >>> Option 2 -- Grab an updated DB and continue the replication >>> This option will include updates to the source that occurred >>> between A and B >>> in the response to both requests. >>> >>> Option 3 -- Respond to A, then trigger another replication for B >>> In this case we wait till the replication started by A has >>> completed, then >>> do an incremental one and respond to B with the results of that >>> incremental. >>> >>> I think I'd vote for 3. Cheers, Adam >>> >>> >> >> If I follow this correctly, the issue is, "POST to _replicate, a >> second POST to _replicate occurs before the first request finishes" >> (with the same source/target info). >> >> My knowledge of replication is only cursory, but I could also see: >> >> Option 4: >> >> Same as views, we wait for replication to finish and return the same >> result to all clients that made a request. > > I understand this and Adam's option 3 to be the same. What am I > missing? :) No, not quite. In Option 3 the two requesters get different responses. A gets the result of the original request, B gets the result of the replication triggered automatically after the first one that replicates any updates to the DB which happened during the first pass. If no updates occurred, B will receive the result of the first replication. Paul's Option 4 is more like Options 1 and 2, where A and B get identical responses. The difference between 1 and 2 is just whether new updates get included in that response. Whew. >> Option 5: >> >> Return an error on B that says, "Yeah, yeah. Already on it." > > This would make replication behave a bit like compaction. Sort of, in that additional triggers are no-ops. Option 1 also has that behavior. > I think I like 3/4 best. > > Cheers > Jan > -- Best, Adam