Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 73456 invoked from network); 1 Feb 2009 18:27:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Feb 2009 18:27:43 -0000 Received: (qmail 45773 invoked by uid 500); 1 Feb 2009 18:27:41 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 45732 invoked by uid 500); 1 Feb 2009 18:27:41 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 45721 invoked by uid 99); 1 Feb 2009 18:27:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Feb 2009 10:27:41 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=FS_REPLICA,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sho.fukamachi@gmail.com designates 209.85.142.189 as permitted sender) Received: from [209.85.142.189] (HELO ti-out-0910.google.com) (209.85.142.189) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Feb 2009 18:27:32 +0000 Received: by ti-out-0910.google.com with SMTP id a1so487246tib.3 for ; Sun, 01 Feb 2009 10:27:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=vU4GrnrGxBNDiG5m1eyoXzft1pip03zYiKYqqCCblnA=; b=I7U9ZJzdl9mLlbe8maxyFtaGIpSmceoTwF4cZH7UiDj/RA3/JRCn8qQCzZ34mwgEDj N2PIfsSfmYAOBKOREoWgiHwsTl9VJrdp8ZF+zRseh28MLgkYgW16UNpXGTgdsoUu7nDW ZTruDhyFqQGzSMXBdUjSknhElWz9w8ARH0K+Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=amaZo0KGlMmPW3kX1TccFEy1ROtGL7FAoFjhc4wT2z4UnYpBY8Zf2zYa5H2NYej28k iAF0Qzs/2spuo/1tcX0Cnx+Yft0Yo+X1Ua7oro5tvAfbrltKzPGVWVOoxhVKo05qZnoU Raow3IBbmlGu1TZ336EPbxcd6gZsB6zLLmDkE= Received: by 10.110.68.10 with SMTP id q10mr4942806tia.32.1233512829701; Sun, 01 Feb 2009 10:27:09 -0800 (PST) Received: from ?10.1.1.8? (203-158-51-10.dyn.iinet.net.au [203.158.51.10]) by mx.google.com with ESMTPS id w12sm7462207tib.19.2009.02.01.10.27.07 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 01 Feb 2009 10:27:09 -0800 (PST) Message-Id: <98180BEF-BF9B-44AF-972F-E9B5299FFF9F@gmail.com> From: Sho Fukamachi To: user@couchdb.apache.org In-Reply-To: <769027D0-16A2-4D97-9956-B28F7ADC9B25@gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: replication error Date: Mon, 2 Feb 2009 05:27:03 +1100 References: <5aaed53f0901280717q2ffa6dcfu2d84efe6ac1e2edb@mail.gmail.com> <3FC3B3E6-9AA5-441A-B54F-F1B47B9A4C91@gmail.com> <5aaed53f0901281607x38e55c6cj4962d513abc6b3bd@mail.gmail.com> <2EB48D00-C388-4F03-9914-76612264326D@gmail.com> <5aaed53f0901292127s8c9385bme7d2dda9422c2602@mail.gmail.com> <769027D0-16A2-4D97-9956-B28F7ADC9B25@gmail.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org On 02/02/2009, at 5:01 AM, Adam Kocoloski wrote: >> [...] > > That's odd. I tried setting a 120 second timeout and didn't have > any trouble. Then again, I only ran the test suite; I didn't > actually force a timeout to occur or anything. Sorry, I don't have > any hints at the moment. Guh. I'm an idiot. I'd forgotten to create the destination database. IIn my haste to test it I used futon, not my normal script, and of course, interpreted the error as something with the code I'd changed. Sorry about that. : / With the changes, it worked first time .. although did give a spurious error about how a server had restarted. > Multipart won't solve the problem where ibrowse throws a timeout > error even while it's still sending data. That seems like a pretty > curious choice on ibrowse' part to me. Maybe when I have some more > free time I can look into the timeout algo and see if it can be > tweaked so that it only starts after the request has been fully > transmitted. I think that would pretty much solve this problem. > Barring that, I agree that some sort of back-off algorithm that > lengthens the timeout after each failed request is warranted. > > There's also one more knob we can turn. During replication we are > checking the memory consumption of the process collecting docs to > send to the target. If it hits 10MB we send the bulk immediately, > regardless of whether it's 1 doc, 10, or 99. 10MB may be much too > high given a 30 second timeout window in which we have to transmit > the data; 1MB is possibly a better fit for home broadband users. If > you want to fiddle with that knob instead of the ibrowse timeout you > can try changing line 224 of couch_rep.erl so that instead of > > couch_util:should_flush() > > it would read (value is in bytes) > > couch_util:should_flush(1000000) Awesome tip. Thanks. Yeah, I had never noticed any problem with server to server replication... only when I then tried to do it from home... > I don't have a strong opinion at this point in time about how many > of these parameters ought to be tunable in local.ini. Best, My opinion is usually that pretty much everything with a big effect, like this, should have a sensible default, but overrideable in config. Failing that, maybe the default timeout should be raised? Thanks heaps for your help. Sho