couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Davies (JIRA)" <j...@apache.org>
Subject [jira] Updated: (COUCHDB-197) Replication renders CouchDB unresponsive.
Date Mon, 26 Jan 2009 21:02:00 GMT

     [ https://issues.apache.org/jira/browse/COUCHDB-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Davies updated COUCHDB-197:
---------------------------------

    Attachment: ibrowse.diff

This patch integrates ibrowse 1.4.1 (from http://sourceforge.net/cvs/?group_id=73688) with
couch_rep.erl.  It mostly seems to have cleaned up my strange problems, although I do still
get occasional connection_closed errors.

I've made it retry requests up to 10 times if errors are encountered, and I've limited this
to GET requests on Adam Kocoloski's suggestion.

This patch also includes my couch_redirects.3 patch.

Apparently ibrowse is "dual-licensed" under LGPL and BSD so I think it's safe...

> Replication renders CouchDB unresponsive.
> -----------------------------------------
>
>                 Key: COUCHDB-197
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-197
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>            Reporter: Maximillian Dornseif
>            Priority: Blocker
>             Fix For: 0.9
>
>         Attachments: couch_redirects.2.diff, couch_redirects.3.diff, couch_redirects.diff,
couch_tests.js.diff, ibrowse.diff, push_replication_fix.diff
>
>
> I am quite sure this is not the same issue as in COUCHDB-193.
> Im trying to replicte a somewhat big database {"doc_count":541394,"doc_del_count":265692,"update_seq":2118390,"purge_seq":0,"compact_running":false,"disk_size":16552608803}
to an other machine. 
> I started replication with this:
> send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxx:5984\r\nAccept-Encoding:
identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent:
couchdb-python 0.5dev-r127\r\n\r\n'
> send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxx:5984/hulog_events"}'
> reply: ''
> connect: (couchdb1.local.hudora.biz, 5984)
> send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxxx:5984\r\nAccept-Encoding:
identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent:
couchdb-python 0.5dev-r127\r\n\r\n'
> send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxxx:5984/hulog_events"}'
> (no reply so far)
> On the source server (couchdb1) I see following logentries:
> Mon, 05 Jan 2009 19:34:21 GMT] [info] [<0.12745.45>] 192.168.0.30 - - 'POST' /_replicate
200
> [Mon, 05 Jan 2009 19:35:36 GMT] [info] [<0.107.0>] Compaction for db "hulog_events_test"
completed.
> [Mon, 05 Jan 2009 19:35:45 GMT] [info] [<0.12746.45>] 127.0.0.1 - - 'GET' /hulog_events/
200
> [Mon, 05 Jan 2009 19:35:46 GMT] [info] [<0.95.0>] Compaction for db "eap" completed.
> [Mon, 05 Jan 2009 19:42:17 GMT] [error] [<0.12765.45>] ** Generic server <0.12765.45>
terminating 
> ** Last message in was {'EXIT',<0.12762.45>,
>                         {timeout,
>                          {gen_server,call,
>                           [<0.12768.45>,
>                            {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,
>                               0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
>                               0,0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
>                               52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
>                               0,10,99,114,101,97,116,101,100,95,97,116,
>                               109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                               53,52,50,48,54,46,51,52,52,54,49,56,104,2,
>                               109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
>                               2,104,2,109,0,0,0,8,108,111,99,97,116,105,
>                               111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
>                               109,0,0,0,6,104,101,105,103,104,116,98,0,0,
>                               7,158,106,104,2,109,0,0,0,3,109,117,105,109,
>                               0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
>                               48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
>                               117,97,110,116,105,116,121,97,11,106,106>>}]}}}
> ** When Server state == {file_descriptor,prim_file,{#Port<0.904761>,24}}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,
>                         [<0.12768.45>,
>                          {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                   2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                   2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                   116,105,111,110,109,0,0,0,8,114,101,116,
>                                   114,105,101,118,101,104,2,109,0,0,0,4,
>                                   116,121,112,101,109,0,0,0,4,117,110,105,
>                                   116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                   118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                   48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                   46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                   99,114,101,97,116,101,100,95,97,116,109,
>                                   0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                   53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                   2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                   0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                   116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                   65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                   104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                   3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                   53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                   50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                   105,116,121,97,11,106,106>>}]}}
> [Mon, 05 Jan 2009 19:42:57 GMT] [error] [<0.12765.45>] {error_report,<0.22.0>,
>     {<0.12765.45>,crash_report,
>      [[{pid,<0.12765.45>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.12768.45>,
>                         {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,0,
>                               0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0,
>                               0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,0,
>                               0,22,50,48,48,56,48,50,50,50,84,49,50,49,52,
>                               48,53,46,53,50,54,51,56,52,104,2,109,0,0,0,
>                               10,99,114,101,97,116,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,55,49,49,50,56,84,49,53,52,
>                               50,48,54,46,51,52,52,54,49,56,104,2,109,0,0,
>                               0,4,112,114,111,112,104,1,108,0,0,0,2,104,2,
>                               109,0,0,0,8,108,111,99,97,116,105,111,110,
>                               109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0,
>                               0,6,104,101,105,103,104,116,98,0,0,7,158,106,
>                               104,2,109,0,0,0,3,109,117,105,109,0,0,0,18,
>                               51,52,48,48,53,57,57,56,49,48,48,48,48,51,49,
>                               50,53,50,104,2,109,0,0,0,8,113,117,97,110,
>                               116,105,116,121,97,11,106,106>>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {initial_call,{couch_file,init,['Argument__1']}},
>        {ancestors,[<0.12762.45>]},
>        {messages,[]},
>        {links,[#Port<0.904761>]},
>        {dictionary,[]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,987},
>        {stack_size,23},
>        {reductions,836156}],
>       []]}}
> [Mon, 05 Jan 2009 19:43:02 GMT] [error] [<0.22399.43>] ** Generic server <0.22399.43>
terminating 
> ** Last message in was {'EXIT',<0.10848.41>,
>                         {timeout,
>                          {gen_server,call,
>                           [<0.12768.45>,
>                            {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,
>                               0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
>                               0,0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
>                               52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
>                               0,10,99,114,101,97,116,101,100,95,97,116,
>                               109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                               53,52,50,48,54,46,51,52,52,54,49,56,104,2,
>                               109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
>                               2,104,2,109,0,0,0,8,108,111,99,97,116,105,
>                               111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
>                               109,0,0,0,6,104,101,105,103,104,116,98,0,0,
>                               7,158,106,104,2,109,0,0,0,3,109,117,105,109,
>                               0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
>                               48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
>                               117,97,110,116,105,116,121,97,11,106,106>>}]}}}
> ** When Server state == {file_descriptor,prim_file,{#Port<0.904494>,16}}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,
>                         [<0.12768.45>,
>                          {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                   2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                   2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                   116,105,111,110,109,0,0,0,8,114,101,116,
>                                   114,105,101,118,101,104,2,109,0,0,0,4,
>                                   116,121,112,101,109,0,0,0,4,117,110,105,
>                                   116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                   118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                   48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                   46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                   99,114,101,97,116,101,100,95,97,116,109,
>                                   0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                   53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                   2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                   0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                   116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                   65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                   104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                   3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                   53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                   50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                   105,116,121,97,11,106,106>>}]}}
> [Mon, 05 Jan 2009 19:43:28 GMT] [error] [<0.22399.43>] {error_report,<0.22.0>,
>     {<0.22399.43>,crash_report,
>      [[{pid,<0.22399.43>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.12768.45>,
>                         {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,0,
>                               0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0,
>                               0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,0,
>                               0,22,50,48,48,56,48,50,50,50,84,49,50,49,52,
>                               48,53,46,53,50,54,51,56,52,104,2,109,0,0,0,
>                               10,99,114,101,97,116,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,55,49,49,50,56,84,49,53,52,
>                               50,48,54,46,51,52,52,54,49,56,104,2,109,0,0,
>                               0,4,112,114,111,112,104,1,108,0,0,0,2,104,2,
>                               109,0,0,0,8,108,111,99,97,116,105,111,110,
>                               109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0,
>                               0,6,104,101,105,103,104,116,98,0,0,7,158,106,
>                               104,2,109,0,0,0,3,109,117,105,109,0,0,0,18,
>                               51,52,48,48,53,57,57,56,49,48,48,48,48,51,49,
>                               50,53,50,104,2,109,0,0,0,8,113,117,97,110,
>                               116,105,116,121,97,11,106,106>>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {initial_call,{couch_file,init,['Argument__1']}},
>        {ancestors,
>            [<0.10848.41>,<0.10847.41>,couch_server,couch_primary_services,
>             couch_server_sup,<0.1.0>]},
>        {messages,
>            [{'DOWN',#Ref<0.0.81.132266>,process,<0.10847.41>,
>                 {timeout,
>                     {gen_server,call,
>                         [<0.12768.45>,
>                          {write,
>                              <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                116,105,111,110,109,0,0,0,8,114,101,116,
>                                114,105,101,118,101,104,2,109,0,0,0,4,
>                                116,121,112,101,109,0,0,0,4,117,110,105,
>                                116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                99,114,101,97,116,101,100,95,97,116,109,
>                                0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                105,116,121,97,11,106,106>>}]}}}]},
>        {links,[#Port<0.904494>]},
>        {dictionary,[{<0.10847.41>,{#Ref<0.0.81.132266>,1}}]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,987},
>        {stack_size,23},
>        {reductions,5627554}],
>       []]}}
> (and nothing further)
> I still can access couchdb1 (the source) but every trivial request takes exactly 5015ms:
> balancer:/filespace/couchdb/log# time curl -i http://127.0.0.1:5984/
> HTTP/1.1 200 OK
> Server: CouchDB/0.9.0a731357-incubating (Erlang OTP/R12B)
> Date: Mon, 05 Jan 2009 20:45:46 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 102
> Cache-Control: must-revalidate
> {"couchdb":"Welcome","version":"0.9.0a731357-incubating","start_time":"Sun, 04 Jan 2009
21:43:13 GMT"}
> real	0m5.015s
> user	0m0.008s
> sys	0m0.000s
> For these accesses no log entries  on couchdb1 are created.
> Meanwhile on the destination server (couchdb2) I can see lot of activity:
> [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19601.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... 40 lines...
> [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19644.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19652.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19744.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19747.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19844.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:43 GMT] [info] [<0.19944.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.19948.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.20044.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:49:13 GMT] [info] [<0.20045.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> But the number of documents on the destination servers hasnt been incerasing in the meantime:
> {"db_name":"hulog_events","doc_count":25926,"doc_del_count":10074,"update_seq":36000,"purge_seq":0,"compact_running":false,"disk_size":21927524}
> couchdb1 is CouchDB 0.9.0a731357-incubating
> couchdb2 is CouchDB 0.9.0a730405-incubating

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message