couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-197) Replication renders CouchDB unresponsive.
Date Mon, 05 Jan 2009 22:31:44 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660961#action_12660961
] 

Adam Kocoloski commented on COUCHDB-197:
----------------------------------------

Hi Maximillian and devs, I've seen these errors on occasion as well, but haven't had time
to dig deep and figure out the best way to handle them.  Right now, couch_rep:do_http_request
doesn't try to match any of the errors that you saw after you upgraded the target:

{error,closed}
{error,econnreset}
{error,session_remotly_closed}

At first glance I'm not sure I understand why the replication still seems to be going forward.
 I could be missing something simple.

For what it's worth, replication seems to work better if you initiate on the target server.
 That way, it's all GET requests underneath instead of POSTs.  Its faster and introduces a
smaller combined load on the servers, since the target Couch can use a single persistent TCP
connection to the source and pipeline its requests.

I'll try to take a closer look at this soon, but unfortunately it's not an easy one to reproduce.
 Regards,

Adam

> Replication renders CouchDB unresponsive.
> -----------------------------------------
>
>                 Key: COUCHDB-197
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-197
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Maximillian Dornseif
>
> I am quite sure this is not the same issue as in COUCHDB-193.
> Im trying to replicte a somewhat big database {"doc_count":541394,"doc_del_count":265692,"update_seq":2118390,"purge_seq":0,"compact_running":false,"disk_size":16552608803}
to an other machine. 
> I started replication with this:
> send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxx:5984\r\nAccept-Encoding:
identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent:
couchdb-python 0.5dev-r127\r\n\r\n'
> send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxx:5984/hulog_events"}'
> reply: ''
> connect: (couchdb1.local.hudora.biz, 5984)
> send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxxx:5984\r\nAccept-Encoding:
identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent:
couchdb-python 0.5dev-r127\r\n\r\n'
> send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxxx:5984/hulog_events"}'
> (no reply so far)
> On the source server (couchdb1) I see following logentries:
> Mon, 05 Jan 2009 19:34:21 GMT] [info] [<0.12745.45>] 192.168.0.30 - - 'POST' /_replicate
200
> [Mon, 05 Jan 2009 19:35:36 GMT] [info] [<0.107.0>] Compaction for db "hulog_events_test"
completed.
> [Mon, 05 Jan 2009 19:35:45 GMT] [info] [<0.12746.45>] 127.0.0.1 - - 'GET' /hulog_events/
200
> [Mon, 05 Jan 2009 19:35:46 GMT] [info] [<0.95.0>] Compaction for db "eap" completed.
> [Mon, 05 Jan 2009 19:42:17 GMT] [error] [<0.12765.45>] ** Generic server <0.12765.45>
terminating 
> ** Last message in was {'EXIT',<0.12762.45>,
>                         {timeout,
>                          {gen_server,call,
>                           [<0.12768.45>,
>                            {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,
>                               0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
>                               0,0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
>                               52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
>                               0,10,99,114,101,97,116,101,100,95,97,116,
>                               109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                               53,52,50,48,54,46,51,52,52,54,49,56,104,2,
>                               109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
>                               2,104,2,109,0,0,0,8,108,111,99,97,116,105,
>                               111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
>                               109,0,0,0,6,104,101,105,103,104,116,98,0,0,
>                               7,158,106,104,2,109,0,0,0,3,109,117,105,109,
>                               0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
>                               48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
>                               117,97,110,116,105,116,121,97,11,106,106>>}]}}}
> ** When Server state == {file_descriptor,prim_file,{#Port<0.904761>,24}}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,
>                         [<0.12768.45>,
>                          {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                   2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                   2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                   116,105,111,110,109,0,0,0,8,114,101,116,
>                                   114,105,101,118,101,104,2,109,0,0,0,4,
>                                   116,121,112,101,109,0,0,0,4,117,110,105,
>                                   116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                   118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                   48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                   46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                   99,114,101,97,116,101,100,95,97,116,109,
>                                   0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                   53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                   2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                   0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                   116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                   65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                   104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                   3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                   53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                   50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                   105,116,121,97,11,106,106>>}]}}
> [Mon, 05 Jan 2009 19:42:57 GMT] [error] [<0.12765.45>] {error_report,<0.22.0>,
>     {<0.12765.45>,crash_report,
>      [[{pid,<0.12765.45>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.12768.45>,
>                         {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,0,
>                               0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0,
>                               0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,0,
>                               0,22,50,48,48,56,48,50,50,50,84,49,50,49,52,
>                               48,53,46,53,50,54,51,56,52,104,2,109,0,0,0,
>                               10,99,114,101,97,116,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,55,49,49,50,56,84,49,53,52,
>                               50,48,54,46,51,52,52,54,49,56,104,2,109,0,0,
>                               0,4,112,114,111,112,104,1,108,0,0,0,2,104,2,
>                               109,0,0,0,8,108,111,99,97,116,105,111,110,
>                               109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0,
>                               0,6,104,101,105,103,104,116,98,0,0,7,158,106,
>                               104,2,109,0,0,0,3,109,117,105,109,0,0,0,18,
>                               51,52,48,48,53,57,57,56,49,48,48,48,48,51,49,
>                               50,53,50,104,2,109,0,0,0,8,113,117,97,110,
>                               116,105,116,121,97,11,106,106>>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {initial_call,{couch_file,init,['Argument__1']}},
>        {ancestors,[<0.12762.45>]},
>        {messages,[]},
>        {links,[#Port<0.904761>]},
>        {dictionary,[]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,987},
>        {stack_size,23},
>        {reductions,836156}],
>       []]}}
> [Mon, 05 Jan 2009 19:43:02 GMT] [error] [<0.22399.43>] ** Generic server <0.22399.43>
terminating 
> ** Last message in was {'EXIT',<0.10848.41>,
>                         {timeout,
>                          {gen_server,call,
>                           [<0.12768.45>,
>                            {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,
>                               0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
>                               0,0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
>                               52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
>                               0,10,99,114,101,97,116,101,100,95,97,116,
>                               109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                               53,52,50,48,54,46,51,52,52,54,49,56,104,2,
>                               109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
>                               2,104,2,109,0,0,0,8,108,111,99,97,116,105,
>                               111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
>                               109,0,0,0,6,104,101,105,103,104,116,98,0,0,
>                               7,158,106,104,2,109,0,0,0,3,109,117,105,109,
>                               0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
>                               48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
>                               117,97,110,116,105,116,121,97,11,106,106>>}]}}}
> ** When Server state == {file_descriptor,prim_file,{#Port<0.904494>,16}}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,
>                         [<0.12768.45>,
>                          {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                   2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                   2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                   116,105,111,110,109,0,0,0,8,114,101,116,
>                                   114,105,101,118,101,104,2,109,0,0,0,4,
>                                   116,121,112,101,109,0,0,0,4,117,110,105,
>                                   116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                   118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                   48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                   46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                   99,114,101,97,116,101,100,95,97,116,109,
>                                   0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                   53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                   2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                   0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                   116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                   65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                   104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                   3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                   53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                   50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                   105,116,121,97,11,106,106>>}]}}
> [Mon, 05 Jan 2009 19:43:28 GMT] [error] [<0.22399.43>] {error_report,<0.22.0>,
>     {<0.22399.43>,crash_report,
>      [[{pid,<0.22399.43>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.12768.45>,
>                         {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,0,
>                               0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0,
>                               0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,0,
>                               0,22,50,48,48,56,48,50,50,50,84,49,50,49,52,
>                               48,53,46,53,50,54,51,56,52,104,2,109,0,0,0,
>                               10,99,114,101,97,116,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,55,49,49,50,56,84,49,53,52,
>                               50,48,54,46,51,52,52,54,49,56,104,2,109,0,0,
>                               0,4,112,114,111,112,104,1,108,0,0,0,2,104,2,
>                               109,0,0,0,8,108,111,99,97,116,105,111,110,
>                               109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0,
>                               0,6,104,101,105,103,104,116,98,0,0,7,158,106,
>                               104,2,109,0,0,0,3,109,117,105,109,0,0,0,18,
>                               51,52,48,48,53,57,57,56,49,48,48,48,48,51,49,
>                               50,53,50,104,2,109,0,0,0,8,113,117,97,110,
>                               116,105,116,121,97,11,106,106>>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {initial_call,{couch_file,init,['Argument__1']}},
>        {ancestors,
>            [<0.10848.41>,<0.10847.41>,couch_server,couch_primary_services,
>             couch_server_sup,<0.1.0>]},
>        {messages,
>            [{'DOWN',#Ref<0.0.81.132266>,process,<0.10847.41>,
>                 {timeout,
>                     {gen_server,call,
>                         [<0.12768.45>,
>                          {write,
>                              <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                116,105,111,110,109,0,0,0,8,114,101,116,
>                                114,105,101,118,101,104,2,109,0,0,0,4,
>                                116,121,112,101,109,0,0,0,4,117,110,105,
>                                116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                99,114,101,97,116,101,100,95,97,116,109,
>                                0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                105,116,121,97,11,106,106>>}]}}}]},
>        {links,[#Port<0.904494>]},
>        {dictionary,[{<0.10847.41>,{#Ref<0.0.81.132266>,1}}]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,987},
>        {stack_size,23},
>        {reductions,5627554}],
>       []]}}
> (and nothing further)
> I still can access couchdb1 (the source) but every trivial request takes exactly 5015ms:
> balancer:/filespace/couchdb/log# time curl -i http://127.0.0.1:5984/
> HTTP/1.1 200 OK
> Server: CouchDB/0.9.0a731357-incubating (Erlang OTP/R12B)
> Date: Mon, 05 Jan 2009 20:45:46 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 102
> Cache-Control: must-revalidate
> {"couchdb":"Welcome","version":"0.9.0a731357-incubating","start_time":"Sun, 04 Jan 2009
21:43:13 GMT"}
> real	0m5.015s
> user	0m0.008s
> sys	0m0.000s
> For these accesses no log entries  on couchdb1 are created.
> Meanwhile on the destination server (couchdb2) I can see lot of activity:
> [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19601.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... 40 lines...
> [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19644.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19652.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19744.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19747.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19844.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:43 GMT] [info] [<0.19944.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.19948.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.20044.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:49:13 GMT] [info] [<0.20045.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> But the number of documents on the destination servers hasnt been incerasing in the meantime:
> {"db_name":"hulog_events","doc_count":25926,"doc_del_count":10074,"update_seq":36000,"purge_seq":0,"compact_running":false,"disk_size":21927524}
> couchdb1 is CouchDB 0.9.0a731357-incubating
> couchdb2 is CouchDB 0.9.0a730405-incubating

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message