Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 71937 invoked from network); 26 Mar 2010 20:20:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Mar 2010 20:20:44 -0000 Received: (qmail 74875 invoked by uid 500); 26 Mar 2010 20:20:43 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 74823 invoked by uid 500); 26 Mar 2010 20:20:43 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 74808 invoked by uid 99); 26 Mar 2010 20:20:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Mar 2010 20:20:43 +0000 X-ASF-Spam-Status: No, hits=1.7 required=10.0 tests=AWL,FREEMAIL_FROM,FS_REPLICA,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of randall.leeds@gmail.com designates 209.85.223.191 as permitted sender) Received: from [209.85.223.191] (HELO mail-iw0-f191.google.com) (209.85.223.191) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Mar 2010 20:20:38 +0000 Received: by iwn29 with SMTP id 29so4890313iwn.17 for ; Fri, 26 Mar 2010 13:20:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=NLfhKvuHjUQcxUayeAK5UjX9zT4xJdxMHzDiNPHDSd4=; b=prLk62ZbFZMfGuaL6l9SuAw3XoFGPq4IaBnnInDwEg02yq7kOwGANYL9SYkeIC55AB EK8v5sHpzKLiTTuSe6eo9m15hXXFijlpmxrd7ZddcyIWUNZdBhvmtGvkB0ql89zjpbD7 AtmAztgof7iRgj85p3bU7b3P/boJWD9f5wD6o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=XJ7kYPWU2etbaBill7gRuuMezBm3P2DqDtoB8KclqTiEXgmcpwhRAOiGYTp4LYl4JV Ks37Gj8YMFVojCkmpWvpyDJ3NPve/IeZiGqVKd1gOs7+S8brpk3eNdVAerIB2nMYDtGH IIx9CPGLrlH2ayB29MTc9z6BAu2vYV7z+weP0= MIME-Version: 1.0 Received: by 10.231.157.11 with HTTP; Fri, 26 Mar 2010 13:20:17 -0700 (PDT) In-Reply-To: <4BAC8639.9080606@linkfluence.net> References: <4BAC8639.9080606@linkfluence.net> Date: Fri, 26 Mar 2010 13:20:17 -0700 Received: by 10.231.158.205 with SMTP id g13mr652273ibx.30.1269634817444; Fri, 26 Mar 2010 13:20:17 -0700 (PDT) Message-ID: Subject: Re: Replication -- explaining errors From: Randall Leeds To: user@couchdb.apache.org, dev@couchdb.apache.org Content-Type: text/plain; charset=UTF-8 On Fri, Mar 26, 2010 at 03:02, Germain Maurice wrote: > Hi all, > > Still with my problems with replication. > > I will write you a report on a crash of couchdb happened this night but now > i launched again continuous replication hostA to hostB and i get this error > on hostA : > > [Fri, 26 Mar 2010 09:55:01 GMT] [debug] [<0.2466.0>] retrying > couch_rep_httpc post request in 16.0 seconds due to {error, req_timedout} > [Fri, 26 Mar 2010 09:56:13 GMT] [debug] [<0.2466.0>] retrying > couch_rep_httpc post request in 32.0 seconds due to {error, req_timedout} > [Fri, 26 Mar 2010 09:57:42 GMT] [debug] [<0.2466.0>] retrying > couch_rep_httpc post request in 64.0 seconds due to {error, req_timedout} > [Fri, 26 Mar 2010 09:59:40 GMT] [debug] [<0.2466.0>] retrying > couch_rep_httpc post request in 128.0 seconds due to {error, req_timedout} This means that during replication, Couch has issued a post request (most likely this means it's /_ensure_full_commit during a checkpoint) and that it timed out. The retry time backs off exponential. Replication will crash after 10 retries. There was a bug on 0.10.1 with replication crashing. I attempted to fix some of the causes, but there are still some issues it seems. I can reproduce this in production too, and I've had no luck tracking it down yet. I'm going to re-open the 597 ticket and continue the discussion there. -Randall