couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-197) Replication renders CouchDB unresponsive.
Date Fri, 23 Jan 2009 20:31:59 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666688#action_12666688
] 

Adam Kocoloski commented on COUCHDB-197:
----------------------------------------

Jason helped me look into this in some more detail.  As Damien noted, the no_scheme error
means the http request didn't have an http:|https: in the URI.  This happens during replication
because of the 301 redirect on _design%2F URLs (remember, couch_rep encodes all docids, including
design docs -- that may not be necessary anymore).  CouchDB's redirect response includes a
Location header without any scheme.  According to Section 14.30 of rfc2616, that's illegal.

I hacked in an absolute URI in the Location header, but there may be additional problems.
 The Erlang HTTP client hangs after automatically handling one of these redirects.  Eventually,
the remote CouchDB server closes the connections, but that kills all the requests in the pipeline.
 I don't have a patch at the ready, but some opportunities for working around this limitation
include

a) handling the redirect manually in the replication code instead of letting the HTTP client
do it automatically

b) retrying GET requests in the event of failures (we could do this anyway to try to provide
a smoother experience)

I haven't had a chance to dig in on the more frequent errors encountered in POST-based replications.
 Best, Adam

> Replication renders CouchDB unresponsive.
> -----------------------------------------
>
>                 Key: COUCHDB-197
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-197
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Maximillian Dornseif
>
> I am quite sure this is not the same issue as in COUCHDB-193.
> Im trying to replicte a somewhat big database {"doc_count":541394,"doc_del_count":265692,"update_seq":2118390,"purge_seq":0,"compact_running":false,"disk_size":16552608803}
to an other machine. 
> I started replication with this:
> send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxx:5984\r\nAccept-Encoding:
identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent:
couchdb-python 0.5dev-r127\r\n\r\n'
> send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxx:5984/hulog_events"}'
> reply: ''
> connect: (couchdb1.local.hudora.biz, 5984)
> send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxxx:5984\r\nAccept-Encoding:
identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent:
couchdb-python 0.5dev-r127\r\n\r\n'
> send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxxx:5984/hulog_events"}'
> (no reply so far)
> On the source server (couchdb1) I see following logentries:
> Mon, 05 Jan 2009 19:34:21 GMT] [info] [<0.12745.45>] 192.168.0.30 - - 'POST' /_replicate
200
> [Mon, 05 Jan 2009 19:35:36 GMT] [info] [<0.107.0>] Compaction for db "hulog_events_test"
completed.
> [Mon, 05 Jan 2009 19:35:45 GMT] [info] [<0.12746.45>] 127.0.0.1 - - 'GET' /hulog_events/
200
> [Mon, 05 Jan 2009 19:35:46 GMT] [info] [<0.95.0>] Compaction for db "eap" completed.
> [Mon, 05 Jan 2009 19:42:17 GMT] [error] [<0.12765.45>] ** Generic server <0.12765.45>
terminating 
> ** Last message in was {'EXIT',<0.12762.45>,
>                         {timeout,
>                          {gen_server,call,
>                           [<0.12768.45>,
>                            {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,
>                               0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
>                               0,0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
>                               52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
>                               0,10,99,114,101,97,116,101,100,95,97,116,
>                               109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                               53,52,50,48,54,46,51,52,52,54,49,56,104,2,
>                               109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
>                               2,104,2,109,0,0,0,8,108,111,99,97,116,105,
>                               111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
>                               109,0,0,0,6,104,101,105,103,104,116,98,0,0,
>                               7,158,106,104,2,109,0,0,0,3,109,117,105,109,
>                               0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
>                               48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
>                               117,97,110,116,105,116,121,97,11,106,106>>}]}}}
> ** When Server state == {file_descriptor,prim_file,{#Port<0.904761>,24}}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,
>                         [<0.12768.45>,
>                          {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                   2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                   2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                   116,105,111,110,109,0,0,0,8,114,101,116,
>                                   114,105,101,118,101,104,2,109,0,0,0,4,
>                                   116,121,112,101,109,0,0,0,4,117,110,105,
>                                   116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                   118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                   48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                   46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                   99,114,101,97,116,101,100,95,97,116,109,
>                                   0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                   53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                   2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                   0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                   116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                   65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                   104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                   3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                   53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                   50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                   105,116,121,97,11,106,106>>}]}}
> [Mon, 05 Jan 2009 19:42:57 GMT] [error] [<0.12765.45>] {error_report,<0.22.0>,
>     {<0.12765.45>,crash_report,
>      [[{pid,<0.12765.45>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.12768.45>,
>                         {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,0,
>                               0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0,
>                               0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,0,
>                               0,22,50,48,48,56,48,50,50,50,84,49,50,49,52,
>                               48,53,46,53,50,54,51,56,52,104,2,109,0,0,0,
>                               10,99,114,101,97,116,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,55,49,49,50,56,84,49,53,52,
>                               50,48,54,46,51,52,52,54,49,56,104,2,109,0,0,
>                               0,4,112,114,111,112,104,1,108,0,0,0,2,104,2,
>                               109,0,0,0,8,108,111,99,97,116,105,111,110,
>                               109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0,
>                               0,6,104,101,105,103,104,116,98,0,0,7,158,106,
>                               104,2,109,0,0,0,3,109,117,105,109,0,0,0,18,
>                               51,52,48,48,53,57,57,56,49,48,48,48,48,51,49,
>                               50,53,50,104,2,109,0,0,0,8,113,117,97,110,
>                               116,105,116,121,97,11,106,106>>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {initial_call,{couch_file,init,['Argument__1']}},
>        {ancestors,[<0.12762.45>]},
>        {messages,[]},
>        {links,[#Port<0.904761>]},
>        {dictionary,[]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,987},
>        {stack_size,23},
>        {reductions,836156}],
>       []]}}
> [Mon, 05 Jan 2009 19:43:02 GMT] [error] [<0.22399.43>] ** Generic server <0.22399.43>
terminating 
> ** Last message in was {'EXIT',<0.10848.41>,
>                         {timeout,
>                          {gen_server,call,
>                           [<0.12768.45>,
>                            {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,
>                               0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
>                               0,0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
>                               52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
>                               0,10,99,114,101,97,116,101,100,95,97,116,
>                               109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                               53,52,50,48,54,46,51,52,52,54,49,56,104,2,
>                               109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
>                               2,104,2,109,0,0,0,8,108,111,99,97,116,105,
>                               111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
>                               109,0,0,0,6,104,101,105,103,104,116,98,0,0,
>                               7,158,106,104,2,109,0,0,0,3,109,117,105,109,
>                               0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
>                               48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
>                               117,97,110,116,105,116,121,97,11,106,106>>}]}}}
> ** When Server state == {file_descriptor,prim_file,{#Port<0.904494>,16}}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,
>                         [<0.12768.45>,
>                          {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                   2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                   2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                   116,105,111,110,109,0,0,0,8,114,101,116,
>                                   114,105,101,118,101,104,2,109,0,0,0,4,
>                                   116,121,112,101,109,0,0,0,4,117,110,105,
>                                   116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                   118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                   48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                   46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                   99,114,101,97,116,101,100,95,97,116,109,
>                                   0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                   53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                   2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                   0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                   116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                   65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                   104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                   3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                   53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                   50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                   105,116,121,97,11,106,106>>}]}}
> [Mon, 05 Jan 2009 19:43:28 GMT] [error] [<0.22399.43>] {error_report,<0.22.0>,
>     {<0.22399.43>,crash_report,
>      [[{pid,<0.22399.43>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.12768.45>,
>                         {write,
>                             <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
>                               109,0,0,0,7,112,114,111,100,117,99,116,109,0,
>                               0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0,
>                               0,11,116,114,97,110,115,97,99,116,105,111,
>                               110,109,0,0,0,8,114,101,116,114,105,101,118,
>                               101,104,2,109,0,0,0,4,116,121,112,101,109,0,
>                               0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
>                               114,99,104,105,118,101,100,95,97,116,109,0,0,
>                               0,22,50,48,48,56,48,50,50,50,84,49,50,49,52,
>                               48,53,46,53,50,54,51,56,52,104,2,109,0,0,0,
>                               10,99,114,101,97,116,101,100,95,97,116,109,0,
>                               0,0,22,50,48,48,55,49,49,50,56,84,49,53,52,
>                               50,48,54,46,51,52,52,54,49,56,104,2,109,0,0,
>                               0,4,112,114,111,112,104,1,108,0,0,0,2,104,2,
>                               109,0,0,0,8,108,111,99,97,116,105,111,110,
>                               109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0,
>                               0,6,104,101,105,103,104,116,98,0,0,7,158,106,
>                               104,2,109,0,0,0,3,109,117,105,109,0,0,0,18,
>                               51,52,48,48,53,57,57,56,49,48,48,48,48,51,49,
>                               50,53,50,104,2,109,0,0,0,8,113,117,97,110,
>                               116,105,116,121,97,11,106,106>>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {initial_call,{couch_file,init,['Argument__1']}},
>        {ancestors,
>            [<0.10848.41>,<0.10847.41>,couch_server,couch_primary_services,
>             couch_server_sup,<0.1.0>]},
>        {messages,
>            [{'DOWN',#Ref<0.0.81.132266>,process,<0.10847.41>,
>                 {timeout,
>                     {gen_server,call,
>                         [<0.12768.45>,
>                          {write,
>                              <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
>                                2,109,0,0,0,7,112,114,111,100,117,99,116,
>                                109,0,0,0,8,54,53,49,52,48,47,69,75,104,
>                                2,109,0,0,0,11,116,114,97,110,115,97,99,
>                                116,105,111,110,109,0,0,0,8,114,101,116,
>                                114,105,101,118,101,104,2,109,0,0,0,4,
>                                116,121,112,101,109,0,0,0,4,117,110,105,
>                                116,104,2,109,0,0,0,11,97,114,99,104,105,
>                                118,101,100,95,97,116,109,0,0,0,22,50,48,
>                                48,56,48,50,50,50,84,49,50,49,52,48,53,
>                                46,53,50,54,51,56,52,104,2,109,0,0,0,10,
>                                99,114,101,97,116,101,100,95,97,116,109,
>                                0,0,0,22,50,48,48,55,49,49,50,56,84,49,
>                                53,52,50,48,54,46,51,52,52,54,49,56,104,
>                                2,109,0,0,0,4,112,114,111,112,104,1,108,
>                                0,0,0,2,104,2,109,0,0,0,8,108,111,99,97,
>                                116,105,111,110,109,0,0,0,6,65,85,83,76,
>                                65,71,104,2,109,0,0,0,6,104,101,105,103,
>                                104,116,98,0,0,7,158,106,104,2,109,0,0,0,
>                                3,109,117,105,109,0,0,0,18,51,52,48,48,
>                                53,57,57,56,49,48,48,48,48,51,49,50,53,
>                                50,104,2,109,0,0,0,8,113,117,97,110,116,
>                                105,116,121,97,11,106,106>>}]}}}]},
>        {links,[#Port<0.904494>]},
>        {dictionary,[{<0.10847.41>,{#Ref<0.0.81.132266>,1}}]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,987},
>        {stack_size,23},
>        {reductions,5627554}],
>       []]}}
> (and nothing further)
> I still can access couchdb1 (the source) but every trivial request takes exactly 5015ms:
> balancer:/filespace/couchdb/log# time curl -i http://127.0.0.1:5984/
> HTTP/1.1 200 OK
> Server: CouchDB/0.9.0a731357-incubating (Erlang OTP/R12B)
> Date: Mon, 05 Jan 2009 20:45:46 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 102
> Cache-Control: must-revalidate
> {"couchdb":"Welcome","version":"0.9.0a731357-incubating","start_time":"Sun, 04 Jan 2009
21:43:13 GMT"}
> real	0m5.015s
> user	0m0.008s
> sys	0m0.000s
> For these accesses no log entries  on couchdb1 are created.
> Meanwhile on the destination server (couchdb2) I can see lot of activity:
> [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19601.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... 40 lines...
> [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19644.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19652.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19744.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19747.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19844.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:48:43 GMT] [info] [<0.19944.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.19948.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> ... ca 200 lines
> [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.20044.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> [Mon, 05 Jan 2009 20:49:13 GMT] [info] [<0.20045.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs
200
> But the number of documents on the destination servers hasnt been incerasing in the meantime:
> {"db_name":"hulog_events","doc_count":25926,"doc_del_count":10074,"update_seq":36000,"purge_seq":0,"compact_running":false,"disk_size":21927524}
> couchdb1 is CouchDB 0.9.0a731357-incubating
> couchdb2 is CouchDB 0.9.0a730405-incubating

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message