Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62A38CEFA for ; Fri, 25 May 2012 12:08:27 +0000 (UTC) Received: (qmail 13964 invoked by uid 500); 25 May 2012 12:08:26 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 13675 invoked by uid 500); 25 May 2012 12:08:25 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 13625 invoked by uid 99); 25 May 2012 12:08:24 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 May 2012 12:08:24 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 3B2FD141887 for ; Fri, 25 May 2012 12:08:24 +0000 (UTC) Date: Fri, 25 May 2012 12:08:24 +0000 (UTC) From: "Benoit Chesneau (JIRA)" To: dev@couchdb.apache.org Message-ID: <1678255582.1538.1337947704244.JavaMail.jiratomcat@issues-vm> In-Reply-To: <952684468.7373.1334065519171.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (COUCHDB-1461) replication timeout and loop MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COUCHDB-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283328#comment-13283328 ] Benoit Chesneau commented on COUCHDB-1461: ------------------------------------------ @filipe was about to merge this version (since it works in production). What will be the change? > replication timeout and loop > ---------------------------- > > Key: COUCHDB-1461 > URL: https://issues.apache.org/jira/browse/COUCHDB-1461 > Project: CouchDB > Issue Type: Bug > Affects Versions: 1.2, 1.3 > Reporter: Benoit Chesneau > Attachments: 12x-0001-Avoid-possible-timeout-initializing-replications.patch, master-0001-Avoid-possible-timeout-initializing-replications.patch, test.py > > > When you try to do at the same time a replication in both way, it will timeout then restart after 5s. Sometimes it won't be able to recover well. Adding a sleep between 2 reps is possibly solving it but it shouldn't be needed. > Attached is a script using couchdbkit to reproduce the problem. SERVER_URI need to be changed to point to your couchdb node. > Log: > > 09:09:24.016 [info] 127.0.0.1 - - HEAD /testdb1/ 404 > 09:09:24.028 [info] 127.0.0.1 - - PUT /testdb1/ 201 > 09:09:24.033 [info] 127.0.0.1 - - HEAD /testdb2/ 404 > 09:09:24.046 [info] 127.0.0.1 - - PUT /testdb2/ 201 > 09:09:24.071 [info] 127.0.0.1 - - GET > /_replicator/_all_docs?include_docs=true 200 > 09:09:28.110 [info] 127.0.0.1 - - PUT /_replicator/rep1 201 > 09:09:28.119 [info] 127.0.0.1 - - PUT /_replicator/rep2 201 > 09:09:28.121 [info] Attempting to start replication > `23280770e617f3a82f398b8eca09aaef` (document `rep1`). > 09:09:28.123 [info] Attempting to start replication > `e42aaea4a0ceb931930834ecf7b79600` (document `rep2`). > 09:09:28.169 [info] 127.0.0.1 - - HEAD /testdb2/ 200 > 09:09:28.172 [info] 127.0.0.1 - - GET /testdb2/ 200 > 09:09:28.176 [info] 127.0.0.1 - - GET > /testdb2/_local/e42aaea4a0ceb931930834ecf7b79600 404 > 09:09:28.179 [info] 127.0.0.1 - - GET > /testdb2/_local/f129a5531f82eb089a3e1ca9e80c9ad2 404 > 09:09:28.194 [info] Replication `"e42aaea4a0ceb931930834ecf7b79600"` is using: > 4 worker processes > a worker batch size of 500 > 20 HTTP connections > a connection timeout of 30000 milliseconds > 10 retries per request > socket options are: [{keepalive,true},{nodelay,false}] > 09:09:28.196 [info] 127.0.0.1 - - GET > /testdb2/_changes?feed=normal&style=all_docs&since=0&heartbeat=10000 > 200 > 09:09:28.202 [info] Document `rep2` triggered replication > `e42aaea4a0ceb931930834ecf7b79600` > 09:09:28.203 [info] starting new replication > `e42aaea4a0ceb931930834ecf7b79600` at <0.262.0> > (`http://localhost:15984/testdb2/` -> `testdb1`) > 09:09:28.208 [info] 127.0.0.1 - - HEAD /testdb2/ 200 > 09:09:28.212 [info] 127.0.0.1 - - GET /testdb2/ 200 > 09:09:28.218 [info] 127.0.0.1 - - GET > /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404 > 09:09:28.219 [info] Replication `e42aaea4a0ceb931930834ecf7b79600` > finished (triggered by document `rep2`) > 09:09:28.223 [info] 127.0.0.1 - - GET > /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404 > 09:09:28.225 [info] Replication `"23280770e617f3a82f398b8eca09aaef"` is using: > 4 worker processes > a worker batch size of 500 > 20 HTTP connections > a connection timeout of 30000 milliseconds > 10 retries per request > socket options are: [{keepalive,true},{nodelay,false}] > 09:09:58.203 [error] gen_server <0.287.0> terminated with reason: killed > 09:09:58.207 [error] CRASH REPORT Process <0.287.0> with 0 neighbours > crashed with reason: > {killed,[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,737}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} > 09:09:58.215 [error] Error in replication > `23280770e617f3a82f398b8eca09aaef` (triggered by document `rep1`): > timeout > Restarting replication in 5 seconds. > 09:10:03.223 [info] 127.0.0.1 - - HEAD /testdb2/ 200 > 09:10:03.227 [info] 127.0.0.1 - - GET /testdb2/ 200 > 09:10:03.231 [info] 127.0.0.1 - - GET > /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404 > 09:10:03.235 [info] 127.0.0.1 - - GET > /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404 > 09:10:03.237 [info] Replication `"23280770e617f3a82f398b8eca09aaef"` is using: > 4 worker processes > a worker batch size of 500 > 20 HTTP connections > a connection timeout of 30000 milliseconds > 10 retries per request > socket options are: [{keepalive,true},{nodelay,false}] > 09:10:03.244 [info] Document `rep1` triggered replication > `23280770e617f3a82f398b8eca09aaef` > 09:10:03.245 [info] starting new replication > `23280770e617f3a82f398b8eca09aaef` at <0.335.0> (`testdb1` -> > `http://localhost:15984/testdb2/`) > 09:10:03.253 [info] Replication `23280770e617f3a82f398b8eca09aaef` > finished (triggered by document `rep1`) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira