Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 36684 invoked from network); 28 Jan 2009 12:14:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Jan 2009 12:14:35 -0000 Received: (qmail 33970 invoked by uid 500); 28 Jan 2009 12:14:33 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 33933 invoked by uid 500); 28 Jan 2009 12:14:33 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 33922 invoked by uid 99); 28 Jan 2009 12:14:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jan 2009 04:14:33 -0800 X-ASF-Spam-Status: No, hits=-1996.4 required=10.0 tests=ALL_TRUSTED,FS_REPLICA,NORMAL_HTTP_TO_IP,URIBL_RHS_DOB,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jan 2009 12:14:21 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 10710234C4A8 for ; Wed, 28 Jan 2009 04:14:00 -0800 (PST) Message-ID: <527750903.1233144840066.JavaMail.jira@brutus> Date: Wed, 28 Jan 2009 04:14:00 -0800 (PST) From: "Jason Davies (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Commented: (COUCHDB-197) Replication renders CouchDB unresponsive. In-Reply-To: <1679573705.1231189184283.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668011#action_12668011 ] Jason Davies commented on COUCHDB-197: -------------------------------------- Hi Adam, thanks for the feedback. I've updated the code and put it on github to keep better track of it: http://github.com/jasondavies/couchdb/tree/master (feel free to fork and make further changes). It now retries all requests and also retries if it gets a 500 error response. I've tested redirects (using couchdb_httpd:absolute_uri) behind an nginx SSL proxy and they work fine. The only problem is that nginx appears to automatically rewrite design doc URLs so they don't need to redirect (i.e. _design%2Ftest -> _design/test). The /_utils URL redirects to /_utils/ properly as expected though. The only requirement is that the proxy rewrites the request's Host: and the response's Location: headers appropriately - I'm not sure if all proxies do this. 1) I've changed the 5 second timeout to be 10 seconds for now. Any ideas what an appropriate value should be? 2) What value of ChunkSize should we use here? I agree that getting to the bottom of these errors would be a good idea, in case there is something amiss in MochiWeb... > Replication renders CouchDB unresponsive. > ----------------------------------------- > > Key: COUCHDB-197 > URL: https://issues.apache.org/jira/browse/COUCHDB-197 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Affects Versions: 0.9 > Reporter: Maximillian Dornseif > Priority: Blocker > Fix For: 0.9 > > Attachments: couch_redirects.2.diff, couch_redirects.3.diff, couch_redirects.diff, couch_tests.js.diff, ibrowse.diff, push_replication_fix.diff > > > I am quite sure this is not the same issue as in COUCHDB-193. > Im trying to replicte a somewhat big database {"doc_count":541394,"doc_del_count":265692,"update_seq":2118390,"purge_seq":0,"compact_running":false,"disk_size":16552608803} to an other machine. > I started replication with this: > send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxx:5984\r\nAccept-Encoding: identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n' > send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxx:5984/hulog_events"}' > reply: '' > connect: (couchdb1.local.hudora.biz, 5984) > send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxxx:5984\r\nAccept-Encoding: identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n' > send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxxx:5984/hulog_events"}' > (no reply so far) > On the source server (couchdb1) I see following logentries: > Mon, 05 Jan 2009 19:34:21 GMT] [info] [<0.12745.45>] 192.168.0.30 - - 'POST' /_replicate 200 > [Mon, 05 Jan 2009 19:35:36 GMT] [info] [<0.107.0>] Compaction for db "hulog_events_test" completed. > [Mon, 05 Jan 2009 19:35:45 GMT] [info] [<0.12746.45>] 127.0.0.1 - - 'GET' /hulog_events/ 200 > [Mon, 05 Jan 2009 19:35:46 GMT] [info] [<0.95.0>] Compaction for db "eap" completed. > [Mon, 05 Jan 2009 19:42:17 GMT] [error] [<0.12765.45>] ** Generic server <0.12765.45> terminating > ** Last message in was {'EXIT',<0.12762.45>, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, > 109,0,0,0,7,112,114,111,100,117,99,116,109, > 0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0, > 0,0,11,116,114,97,110,115,97,99,116,105,111, > 110,109,0,0,0,8,114,101,116,114,105,101,118, > 101,104,2,109,0,0,0,4,116,121,112,101,109,0, > 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, > 114,99,104,105,118,101,100,95,97,116,109,0, > 0,0,22,50,48,48,56,48,50,50,50,84,49,50,49, > 52,48,53,46,53,50,54,51,56,52,104,2,109,0,0, > 0,10,99,114,101,97,116,101,100,95,97,116, > 109,0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104,2, > 109,0,0,0,4,112,114,111,112,104,1,108,0,0,0, > 2,104,2,109,0,0,0,8,108,111,99,97,116,105, > 111,110,109,0,0,0,6,65,85,83,76,65,71,104,2, > 109,0,0,0,6,104,101,105,103,104,116,98,0,0, > 7,158,106,104,2,109,0,0,0,3,109,117,105,109, > 0,0,0,18,51,52,48,48,53,57,57,56,49,48,48, > 48,48,51,49,50,53,50,104,2,109,0,0,0,8,113, > 117,97,110,116,105,116,121,97,11,106,106>>}]}}} > ** When Server state == {file_descriptor,prim_file,{#Port<0.904761>,24}} > ** Reason for termination == > ** {timeout,{gen_server,call, > [<0.12768.45>, > {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104, > 2,109,0,0,0,7,112,114,111,100,117,99,116, > 109,0,0,0,8,54,53,49,52,48,47,69,75,104, > 2,109,0,0,0,11,116,114,97,110,115,97,99, > 116,105,111,110,109,0,0,0,8,114,101,116, > 114,105,101,118,101,104,2,109,0,0,0,4, > 116,121,112,101,109,0,0,0,4,117,110,105, > 116,104,2,109,0,0,0,11,97,114,99,104,105, > 118,101,100,95,97,116,109,0,0,0,22,50,48, > 48,56,48,50,50,50,84,49,50,49,52,48,53, > 46,53,50,54,51,56,52,104,2,109,0,0,0,10, > 99,114,101,97,116,101,100,95,97,116,109, > 0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104, > 2,109,0,0,0,4,112,114,111,112,104,1,108, > 0,0,0,2,104,2,109,0,0,0,8,108,111,99,97, > 116,105,111,110,109,0,0,0,6,65,85,83,76, > 65,71,104,2,109,0,0,0,6,104,101,105,103, > 104,116,98,0,0,7,158,106,104,2,109,0,0,0, > 3,109,117,105,109,0,0,0,18,51,52,48,48, > 53,57,57,56,49,48,48,48,48,51,49,50,53, > 50,104,2,109,0,0,0,8,113,117,97,110,116, > 105,116,121,97,11,106,106>>}]}} > [Mon, 05 Jan 2009 19:42:57 GMT] [error] [<0.12765.45>] {error_report,<0.22.0>, > {<0.12765.45>,crash_report, > [[{pid,<0.12765.45>}, > {registered_name,[]}, > {error_info, > {exit, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, > 109,0,0,0,7,112,114,111,100,117,99,116,109,0, > 0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0, > 0,11,116,114,97,110,115,97,99,116,105,111, > 110,109,0,0,0,8,114,101,116,114,105,101,118, > 101,104,2,109,0,0,0,4,116,121,112,101,109,0, > 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, > 114,99,104,105,118,101,100,95,97,116,109,0,0, > 0,22,50,48,48,56,48,50,50,50,84,49,50,49,52, > 48,53,46,53,50,54,51,56,52,104,2,109,0,0,0, > 10,99,114,101,97,116,101,100,95,97,116,109,0, > 0,0,22,50,48,48,55,49,49,50,56,84,49,53,52, > 50,48,54,46,51,52,52,54,49,56,104,2,109,0,0, > 0,4,112,114,111,112,104,1,108,0,0,0,2,104,2, > 109,0,0,0,8,108,111,99,97,116,105,111,110, > 109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0, > 0,6,104,101,105,103,104,116,98,0,0,7,158,106, > 104,2,109,0,0,0,3,109,117,105,109,0,0,0,18, > 51,52,48,48,53,57,57,56,49,48,48,48,48,51,49, > 50,53,50,104,2,109,0,0,0,8,113,117,97,110, > 116,105,116,121,97,11,106,106>>}]}}, > [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}}, > {initial_call,{couch_file,init,['Argument__1']}}, > {ancestors,[<0.12762.45>]}, > {messages,[]}, > {links,[#Port<0.904761>]}, > {dictionary,[]}, > {trap_exit,true}, > {status,running}, > {heap_size,987}, > {stack_size,23}, > {reductions,836156}], > []]}} > [Mon, 05 Jan 2009 19:43:02 GMT] [error] [<0.22399.43>] ** Generic server <0.22399.43> terminating > ** Last message in was {'EXIT',<0.10848.41>, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, > 109,0,0,0,7,112,114,111,100,117,99,116,109, > 0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0, > 0,0,11,116,114,97,110,115,97,99,116,105,111, > 110,109,0,0,0,8,114,101,116,114,105,101,118, > 101,104,2,109,0,0,0,4,116,121,112,101,109,0, > 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, > 114,99,104,105,118,101,100,95,97,116,109,0, > 0,0,22,50,48,48,56,48,50,50,50,84,49,50,49, > 52,48,53,46,53,50,54,51,56,52,104,2,109,0,0, > 0,10,99,114,101,97,116,101,100,95,97,116, > 109,0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104,2, > 109,0,0,0,4,112,114,111,112,104,1,108,0,0,0, > 2,104,2,109,0,0,0,8,108,111,99,97,116,105, > 111,110,109,0,0,0,6,65,85,83,76,65,71,104,2, > 109,0,0,0,6,104,101,105,103,104,116,98,0,0, > 7,158,106,104,2,109,0,0,0,3,109,117,105,109, > 0,0,0,18,51,52,48,48,53,57,57,56,49,48,48, > 48,48,51,49,50,53,50,104,2,109,0,0,0,8,113, > 117,97,110,116,105,116,121,97,11,106,106>>}]}}} > ** When Server state == {file_descriptor,prim_file,{#Port<0.904494>,16}} > ** Reason for termination == > ** {timeout,{gen_server,call, > [<0.12768.45>, > {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104, > 2,109,0,0,0,7,112,114,111,100,117,99,116, > 109,0,0,0,8,54,53,49,52,48,47,69,75,104, > 2,109,0,0,0,11,116,114,97,110,115,97,99, > 116,105,111,110,109,0,0,0,8,114,101,116, > 114,105,101,118,101,104,2,109,0,0,0,4, > 116,121,112,101,109,0,0,0,4,117,110,105, > 116,104,2,109,0,0,0,11,97,114,99,104,105, > 118,101,100,95,97,116,109,0,0,0,22,50,48, > 48,56,48,50,50,50,84,49,50,49,52,48,53, > 46,53,50,54,51,56,52,104,2,109,0,0,0,10, > 99,114,101,97,116,101,100,95,97,116,109, > 0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104, > 2,109,0,0,0,4,112,114,111,112,104,1,108, > 0,0,0,2,104,2,109,0,0,0,8,108,111,99,97, > 116,105,111,110,109,0,0,0,6,65,85,83,76, > 65,71,104,2,109,0,0,0,6,104,101,105,103, > 104,116,98,0,0,7,158,106,104,2,109,0,0,0, > 3,109,117,105,109,0,0,0,18,51,52,48,48, > 53,57,57,56,49,48,48,48,48,51,49,50,53, > 50,104,2,109,0,0,0,8,113,117,97,110,116, > 105,116,121,97,11,106,106>>}]}} > [Mon, 05 Jan 2009 19:43:28 GMT] [error] [<0.22399.43>] {error_report,<0.22.0>, > {<0.22399.43>,crash_report, > [[{pid,<0.22399.43>}, > {registered_name,[]}, > {error_info, > {exit, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, > 109,0,0,0,7,112,114,111,100,117,99,116,109,0, > 0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0, > 0,11,116,114,97,110,115,97,99,116,105,111, > 110,109,0,0,0,8,114,101,116,114,105,101,118, > 101,104,2,109,0,0,0,4,116,121,112,101,109,0, > 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, > 114,99,104,105,118,101,100,95,97,116,109,0,0, > 0,22,50,48,48,56,48,50,50,50,84,49,50,49,52, > 48,53,46,53,50,54,51,56,52,104,2,109,0,0,0, > 10,99,114,101,97,116,101,100,95,97,116,109,0, > 0,0,22,50,48,48,55,49,49,50,56,84,49,53,52, > 50,48,54,46,51,52,52,54,49,56,104,2,109,0,0, > 0,4,112,114,111,112,104,1,108,0,0,0,2,104,2, > 109,0,0,0,8,108,111,99,97,116,105,111,110, > 109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0, > 0,6,104,101,105,103,104,116,98,0,0,7,158,106, > 104,2,109,0,0,0,3,109,117,105,109,0,0,0,18, > 51,52,48,48,53,57,57,56,49,48,48,48,48,51,49, > 50,53,50,104,2,109,0,0,0,8,113,117,97,110, > 116,105,116,121,97,11,106,106>>}]}}, > [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}}, > {initial_call,{couch_file,init,['Argument__1']}}, > {ancestors, > [<0.10848.41>,<0.10847.41>,couch_server,couch_primary_services, > couch_server_sup,<0.1.0>]}, > {messages, > [{'DOWN',#Ref<0.0.81.132266>,process,<0.10847.41>, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104, > 2,109,0,0,0,7,112,114,111,100,117,99,116, > 109,0,0,0,8,54,53,49,52,48,47,69,75,104, > 2,109,0,0,0,11,116,114,97,110,115,97,99, > 116,105,111,110,109,0,0,0,8,114,101,116, > 114,105,101,118,101,104,2,109,0,0,0,4, > 116,121,112,101,109,0,0,0,4,117,110,105, > 116,104,2,109,0,0,0,11,97,114,99,104,105, > 118,101,100,95,97,116,109,0,0,0,22,50,48, > 48,56,48,50,50,50,84,49,50,49,52,48,53, > 46,53,50,54,51,56,52,104,2,109,0,0,0,10, > 99,114,101,97,116,101,100,95,97,116,109, > 0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104, > 2,109,0,0,0,4,112,114,111,112,104,1,108, > 0,0,0,2,104,2,109,0,0,0,8,108,111,99,97, > 116,105,111,110,109,0,0,0,6,65,85,83,76, > 65,71,104,2,109,0,0,0,6,104,101,105,103, > 104,116,98,0,0,7,158,106,104,2,109,0,0,0, > 3,109,117,105,109,0,0,0,18,51,52,48,48, > 53,57,57,56,49,48,48,48,48,51,49,50,53, > 50,104,2,109,0,0,0,8,113,117,97,110,116, > 105,116,121,97,11,106,106>>}]}}}]}, > {links,[#Port<0.904494>]}, > {dictionary,[{<0.10847.41>,{#Ref<0.0.81.132266>,1}}]}, > {trap_exit,true}, > {status,running}, > {heap_size,987}, > {stack_size,23}, > {reductions,5627554}], > []]}} > (and nothing further) > I still can access couchdb1 (the source) but every trivial request takes exactly 5015ms: > balancer:/filespace/couchdb/log# time curl -i http://127.0.0.1:5984/ > HTTP/1.1 200 OK > Server: CouchDB/0.9.0a731357-incubating (Erlang OTP/R12B) > Date: Mon, 05 Jan 2009 20:45:46 GMT > Content-Type: text/plain;charset=utf-8 > Content-Length: 102 > Cache-Control: must-revalidate > {"couchdb":"Welcome","version":"0.9.0a731357-incubating","start_time":"Sun, 04 Jan 2009 21:43:13 GMT"} > real 0m5.015s > user 0m0.008s > sys 0m0.000s > For these accesses no log entries on couchdb1 are created. > Meanwhile on the destination server (couchdb2) I can see lot of activity: > [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19601.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... 40 lines... > [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19644.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19652.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... ca 200 lines > [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19744.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19747.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... ca 200 lines > [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19844.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > [Mon, 05 Jan 2009 20:48:43 GMT] [info] [<0.19944.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... ca 200 lines > [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.19948.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... ca 200 lines > [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.20044.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > [Mon, 05 Jan 2009 20:49:13 GMT] [info] [<0.20045.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > But the number of documents on the destination servers hasnt been incerasing in the meantime: > {"db_name":"hulog_events","doc_count":25926,"doc_del_count":10074,"update_seq":36000,"purge_seq":0,"compact_running":false,"disk_size":21927524} > couchdb1 is CouchDB 0.9.0a731357-incubating > couchdb2 is CouchDB 0.9.0a730405-incubating -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.