couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Filipe Manana (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1461) replication timeout and loop
Date Fri, 25 May 2012 12:04:24 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283325#comment-13283325
] 

Filipe Manana commented on COUCHDB-1461:
----------------------------------------

Thanks for testing Benjamin.
I will merge a small variant of that patch soon.
                
> replication timeout and loop
> ----------------------------
>
>                 Key: COUCHDB-1461
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1461
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 1.2, 1.3
>            Reporter: Benoit Chesneau
>         Attachments: 12x-0001-Avoid-possible-timeout-initializing-replications.patch,
master-0001-Avoid-possible-timeout-initializing-replications.patch, test.py
>
>
> When you try to do at the same time a replication in both way, it will timeout then restart
after 5s. Sometimes it won't be able to recover well. Adding a sleep between 2 reps is possibly
solving it but it shouldn't be needed. 
> Attached is a script using couchdbkit to reproduce the problem. SERVER_URI need to be
changed to point to your couchdb node.
> Log:
> > 09:09:24.016 [info] 127.0.0.1 - - HEAD /testdb1/ 404
> 09:09:24.028 [info] 127.0.0.1 - - PUT /testdb1/ 201
> 09:09:24.033 [info] 127.0.0.1 - - HEAD /testdb2/ 404
> 09:09:24.046 [info] 127.0.0.1 - - PUT /testdb2/ 201
> 09:09:24.071 [info] 127.0.0.1 - - GET
> /_replicator/_all_docs?include_docs=true 200
> 09:09:28.110 [info] 127.0.0.1 - - PUT /_replicator/rep1 201
> 09:09:28.119 [info] 127.0.0.1 - - PUT /_replicator/rep2 201
> 09:09:28.121 [info] Attempting to start replication
> `23280770e617f3a82f398b8eca09aaef` (document `rep1`).
> 09:09:28.123 [info] Attempting to start replication
> `e42aaea4a0ceb931930834ecf7b79600` (document `rep2`).
> 09:09:28.169 [info] 127.0.0.1 - - HEAD /testdb2/ 200
> 09:09:28.172 [info] 127.0.0.1 - - GET /testdb2/ 200
> 09:09:28.176 [info] 127.0.0.1 - - GET
> /testdb2/_local/e42aaea4a0ceb931930834ecf7b79600 404
> 09:09:28.179 [info] 127.0.0.1 - - GET
> /testdb2/_local/f129a5531f82eb089a3e1ca9e80c9ad2 404
> 09:09:28.194 [info] Replication `"e42aaea4a0ceb931930834ecf7b79600"` is using:
>        4 worker processes
>        a worker batch size of 500
>        20 HTTP connections
>        a connection timeout of 30000 milliseconds
>        10 retries per request
>        socket options are: [{keepalive,true},{nodelay,false}]
> 09:09:28.196 [info] 127.0.0.1 - - GET
> /testdb2/_changes?feed=normal&style=all_docs&since=0&heartbeat=10000
> 200
> 09:09:28.202 [info] Document `rep2` triggered replication
> `e42aaea4a0ceb931930834ecf7b79600`
> 09:09:28.203 [info] starting new replication
> `e42aaea4a0ceb931930834ecf7b79600` at <0.262.0>
> (`http://localhost:15984/testdb2/` -> `testdb1`)
> 09:09:28.208 [info] 127.0.0.1 - - HEAD /testdb2/ 200
> 09:09:28.212 [info] 127.0.0.1 - - GET /testdb2/ 200
> 09:09:28.218 [info] 127.0.0.1 - - GET
> /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
> 09:09:28.219 [info] Replication `e42aaea4a0ceb931930834ecf7b79600`
> finished (triggered by document `rep2`)
> 09:09:28.223 [info] 127.0.0.1 - - GET
> /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
> 09:09:28.225 [info] Replication `"23280770e617f3a82f398b8eca09aaef"` is using:
>        4 worker processes
>        a worker batch size of 500
>        20 HTTP connections
>        a connection timeout of 30000 milliseconds
>        10 retries per request
>        socket options are: [{keepalive,true},{nodelay,false}]
> 09:09:58.203 [error] gen_server <0.287.0> terminated with reason: killed
> 09:09:58.207 [error] CRASH REPORT Process <0.287.0> with 0 neighbours
> crashed with reason:
> {killed,[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,737}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
> 09:09:58.215 [error] Error in replication
> `23280770e617f3a82f398b8eca09aaef` (triggered by document `rep1`):
> timeout
> Restarting replication in 5 seconds.
> 09:10:03.223 [info] 127.0.0.1 - - HEAD /testdb2/ 200
> 09:10:03.227 [info] 127.0.0.1 - - GET /testdb2/ 200
> 09:10:03.231 [info] 127.0.0.1 - - GET
> /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
> 09:10:03.235 [info] 127.0.0.1 - - GET
> /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
> 09:10:03.237 [info] Replication `"23280770e617f3a82f398b8eca09aaef"` is using:
>        4 worker processes
>        a worker batch size of 500
>        20 HTTP connections
>        a connection timeout of 30000 milliseconds
>        10 retries per request
>        socket options are: [{keepalive,true},{nodelay,false}]
> 09:10:03.244 [info] Document `rep1` triggered replication
> `23280770e617f3a82f398b8eca09aaef`
> 09:10:03.245 [info] starting new replication
> `23280770e617f3a82f398b8eca09aaef` at <0.335.0> (`testdb1` ->
> `http://localhost:15984/testdb2/`)
> 09:10:03.253 [info] Replication `23280770e617f3a82f398b8eca09aaef`
> finished (triggered by document `rep1`)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message