Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 90203 invoked from network); 26 Jan 2009 21:02:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Jan 2009 21:02:30 -0000 Received: (qmail 11740 invoked by uid 500); 26 Jan 2009 21:02:29 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 11701 invoked by uid 500); 26 Jan 2009 21:02:29 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 11690 invoked by uid 99); 26 Jan 2009 21:02:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2009 13:02:29 -0800 X-ASF-Spam-Status: No, hits=-1996.4 required=10.0 tests=ALL_TRUSTED,FS_REPLICA,NORMAL_HTTP_TO_IP,URIBL_RHS_DOB,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2009 21:02:20 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 034F9234C498 for ; Mon, 26 Jan 2009 13:02:00 -0800 (PST) Message-ID: <1792254680.1233003720004.JavaMail.jira@brutus> Date: Mon, 26 Jan 2009 13:02:00 -0800 (PST) From: "Jason Davies (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Updated: (COUCHDB-197) Replication renders CouchDB unresponsive. In-Reply-To: <1679573705.1231189184283.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Davies updated COUCHDB-197: --------------------------------- Attachment: ibrowse.diff This patch integrates ibrowse 1.4.1 (from http://sourceforge.net/cvs/?group_id=73688) with couch_rep.erl. It mostly seems to have cleaned up my strange problems, although I do still get occasional connection_closed errors. I've made it retry requests up to 10 times if errors are encountered, and I've limited this to GET requests on Adam Kocoloski's suggestion. This patch also includes my couch_redirects.3 patch. Apparently ibrowse is "dual-licensed" under LGPL and BSD so I think it's safe... > Replication renders CouchDB unresponsive. > ----------------------------------------- > > Key: COUCHDB-197 > URL: https://issues.apache.org/jira/browse/COUCHDB-197 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Affects Versions: 0.9 > Reporter: Maximillian Dornseif > Priority: Blocker > Fix For: 0.9 > > Attachments: couch_redirects.2.diff, couch_redirects.3.diff, couch_redirects.diff, couch_tests.js.diff, ibrowse.diff, push_replication_fix.diff > > > I am quite sure this is not the same issue as in COUCHDB-193. > Im trying to replicte a somewhat big database {"doc_count":541394,"doc_del_count":265692,"update_seq":2118390,"purge_seq":0,"compact_running":false,"disk_size":16552608803} to an other machine. > I started replication with this: > send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxx:5984\r\nAccept-Encoding: identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n' > send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxx:5984/hulog_events"}' > reply: '' > connect: (couchdb1.local.hudora.biz, 5984) > send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxxx:5984\r\nAccept-Encoding: identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n' > send: '{"source": "hulog_events", "target": "http://couchdb2.local.xxxx:5984/hulog_events"}' > (no reply so far) > On the source server (couchdb1) I see following logentries: > Mon, 05 Jan 2009 19:34:21 GMT] [info] [<0.12745.45>] 192.168.0.30 - - 'POST' /_replicate 200 > [Mon, 05 Jan 2009 19:35:36 GMT] [info] [<0.107.0>] Compaction for db "hulog_events_test" completed. > [Mon, 05 Jan 2009 19:35:45 GMT] [info] [<0.12746.45>] 127.0.0.1 - - 'GET' /hulog_events/ 200 > [Mon, 05 Jan 2009 19:35:46 GMT] [info] [<0.95.0>] Compaction for db "eap" completed. > [Mon, 05 Jan 2009 19:42:17 GMT] [error] [<0.12765.45>] ** Generic server <0.12765.45> terminating > ** Last message in was {'EXIT',<0.12762.45>, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, > 109,0,0,0,7,112,114,111,100,117,99,116,109, > 0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0, > 0,0,11,116,114,97,110,115,97,99,116,105,111, > 110,109,0,0,0,8,114,101,116,114,105,101,118, > 101,104,2,109,0,0,0,4,116,121,112,101,109,0, > 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, > 114,99,104,105,118,101,100,95,97,116,109,0, > 0,0,22,50,48,48,56,48,50,50,50,84,49,50,49, > 52,48,53,46,53,50,54,51,56,52,104,2,109,0,0, > 0,10,99,114,101,97,116,101,100,95,97,116, > 109,0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104,2, > 109,0,0,0,4,112,114,111,112,104,1,108,0,0,0, > 2,104,2,109,0,0,0,8,108,111,99,97,116,105, > 111,110,109,0,0,0,6,65,85,83,76,65,71,104,2, > 109,0,0,0,6,104,101,105,103,104,116,98,0,0, > 7,158,106,104,2,109,0,0,0,3,109,117,105,109, > 0,0,0,18,51,52,48,48,53,57,57,56,49,48,48, > 48,48,51,49,50,53,50,104,2,109,0,0,0,8,113, > 117,97,110,116,105,116,121,97,11,106,106>>}]}}} > ** When Server state == {file_descriptor,prim_file,{#Port<0.904761>,24}} > ** Reason for termination == > ** {timeout,{gen_server,call, > [<0.12768.45>, > {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104, > 2,109,0,0,0,7,112,114,111,100,117,99,116, > 109,0,0,0,8,54,53,49,52,48,47,69,75,104, > 2,109,0,0,0,11,116,114,97,110,115,97,99, > 116,105,111,110,109,0,0,0,8,114,101,116, > 114,105,101,118,101,104,2,109,0,0,0,4, > 116,121,112,101,109,0,0,0,4,117,110,105, > 116,104,2,109,0,0,0,11,97,114,99,104,105, > 118,101,100,95,97,116,109,0,0,0,22,50,48, > 48,56,48,50,50,50,84,49,50,49,52,48,53, > 46,53,50,54,51,56,52,104,2,109,0,0,0,10, > 99,114,101,97,116,101,100,95,97,116,109, > 0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104, > 2,109,0,0,0,4,112,114,111,112,104,1,108, > 0,0,0,2,104,2,109,0,0,0,8,108,111,99,97, > 116,105,111,110,109,0,0,0,6,65,85,83,76, > 65,71,104,2,109,0,0,0,6,104,101,105,103, > 104,116,98,0,0,7,158,106,104,2,109,0,0,0, > 3,109,117,105,109,0,0,0,18,51,52,48,48, > 53,57,57,56,49,48,48,48,48,51,49,50,53, > 50,104,2,109,0,0,0,8,113,117,97,110,116, > 105,116,121,97,11,106,106>>}]}} > [Mon, 05 Jan 2009 19:42:57 GMT] [error] [<0.12765.45>] {error_report,<0.22.0>, > {<0.12765.45>,crash_report, > [[{pid,<0.12765.45>}, > {registered_name,[]}, > {error_info, > {exit, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, > 109,0,0,0,7,112,114,111,100,117,99,116,109,0, > 0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0, > 0,11,116,114,97,110,115,97,99,116,105,111, > 110,109,0,0,0,8,114,101,116,114,105,101,118, > 101,104,2,109,0,0,0,4,116,121,112,101,109,0, > 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, > 114,99,104,105,118,101,100,95,97,116,109,0,0, > 0,22,50,48,48,56,48,50,50,50,84,49,50,49,52, > 48,53,46,53,50,54,51,56,52,104,2,109,0,0,0, > 10,99,114,101,97,116,101,100,95,97,116,109,0, > 0,0,22,50,48,48,55,49,49,50,56,84,49,53,52, > 50,48,54,46,51,52,52,54,49,56,104,2,109,0,0, > 0,4,112,114,111,112,104,1,108,0,0,0,2,104,2, > 109,0,0,0,8,108,111,99,97,116,105,111,110, > 109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0, > 0,6,104,101,105,103,104,116,98,0,0,7,158,106, > 104,2,109,0,0,0,3,109,117,105,109,0,0,0,18, > 51,52,48,48,53,57,57,56,49,48,48,48,48,51,49, > 50,53,50,104,2,109,0,0,0,8,113,117,97,110, > 116,105,116,121,97,11,106,106>>}]}}, > [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}}, > {initial_call,{couch_file,init,['Argument__1']}}, > {ancestors,[<0.12762.45>]}, > {messages,[]}, > {links,[#Port<0.904761>]}, > {dictionary,[]}, > {trap_exit,true}, > {status,running}, > {heap_size,987}, > {stack_size,23}, > {reductions,836156}], > []]}} > [Mon, 05 Jan 2009 19:43:02 GMT] [error] [<0.22399.43>] ** Generic server <0.22399.43> terminating > ** Last message in was {'EXIT',<0.10848.41>, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, > 109,0,0,0,7,112,114,111,100,117,99,116,109, > 0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0, > 0,0,11,116,114,97,110,115,97,99,116,105,111, > 110,109,0,0,0,8,114,101,116,114,105,101,118, > 101,104,2,109,0,0,0,4,116,121,112,101,109,0, > 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, > 114,99,104,105,118,101,100,95,97,116,109,0, > 0,0,22,50,48,48,56,48,50,50,50,84,49,50,49, > 52,48,53,46,53,50,54,51,56,52,104,2,109,0,0, > 0,10,99,114,101,97,116,101,100,95,97,116, > 109,0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104,2, > 109,0,0,0,4,112,114,111,112,104,1,108,0,0,0, > 2,104,2,109,0,0,0,8,108,111,99,97,116,105, > 111,110,109,0,0,0,6,65,85,83,76,65,71,104,2, > 109,0,0,0,6,104,101,105,103,104,116,98,0,0, > 7,158,106,104,2,109,0,0,0,3,109,117,105,109, > 0,0,0,18,51,52,48,48,53,57,57,56,49,48,48, > 48,48,51,49,50,53,50,104,2,109,0,0,0,8,113, > 117,97,110,116,105,116,121,97,11,106,106>>}]}}} > ** When Server state == {file_descriptor,prim_file,{#Port<0.904494>,16}} > ** Reason for termination == > ** {timeout,{gen_server,call, > [<0.12768.45>, > {write,<<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104, > 2,109,0,0,0,7,112,114,111,100,117,99,116, > 109,0,0,0,8,54,53,49,52,48,47,69,75,104, > 2,109,0,0,0,11,116,114,97,110,115,97,99, > 116,105,111,110,109,0,0,0,8,114,101,116, > 114,105,101,118,101,104,2,109,0,0,0,4, > 116,121,112,101,109,0,0,0,4,117,110,105, > 116,104,2,109,0,0,0,11,97,114,99,104,105, > 118,101,100,95,97,116,109,0,0,0,22,50,48, > 48,56,48,50,50,50,84,49,50,49,52,48,53, > 46,53,50,54,51,56,52,104,2,109,0,0,0,10, > 99,114,101,97,116,101,100,95,97,116,109, > 0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104, > 2,109,0,0,0,4,112,114,111,112,104,1,108, > 0,0,0,2,104,2,109,0,0,0,8,108,111,99,97, > 116,105,111,110,109,0,0,0,6,65,85,83,76, > 65,71,104,2,109,0,0,0,6,104,101,105,103, > 104,116,98,0,0,7,158,106,104,2,109,0,0,0, > 3,109,117,105,109,0,0,0,18,51,52,48,48, > 53,57,57,56,49,48,48,48,48,51,49,50,53, > 50,104,2,109,0,0,0,8,113,117,97,110,116, > 105,116,121,97,11,106,106>>}]}} > [Mon, 05 Jan 2009 19:43:28 GMT] [error] [<0.22399.43>] {error_report,<0.22.0>, > {<0.22399.43>,crash_report, > [[{pid,<0.22399.43>}, > {registered_name,[]}, > {error_info, > {exit, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, > 109,0,0,0,7,112,114,111,100,117,99,116,109,0, > 0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,0, > 0,11,116,114,97,110,115,97,99,116,105,111, > 110,109,0,0,0,8,114,101,116,114,105,101,118, > 101,104,2,109,0,0,0,4,116,121,112,101,109,0, > 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, > 114,99,104,105,118,101,100,95,97,116,109,0,0, > 0,22,50,48,48,56,48,50,50,50,84,49,50,49,52, > 48,53,46,53,50,54,51,56,52,104,2,109,0,0,0, > 10,99,114,101,97,116,101,100,95,97,116,109,0, > 0,0,22,50,48,48,55,49,49,50,56,84,49,53,52, > 50,48,54,46,51,52,52,54,49,56,104,2,109,0,0, > 0,4,112,114,111,112,104,1,108,0,0,0,2,104,2, > 109,0,0,0,8,108,111,99,97,116,105,111,110, > 109,0,0,0,6,65,85,83,76,65,71,104,2,109,0,0, > 0,6,104,101,105,103,104,116,98,0,0,7,158,106, > 104,2,109,0,0,0,3,109,117,105,109,0,0,0,18, > 51,52,48,48,53,57,57,56,49,48,48,48,48,51,49, > 50,53,50,104,2,109,0,0,0,8,113,117,97,110, > 116,105,116,121,97,11,106,106>>}]}}, > [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}}, > {initial_call,{couch_file,init,['Argument__1']}}, > {ancestors, > [<0.10848.41>,<0.10847.41>,couch_server,couch_primary_services, > couch_server_sup,<0.1.0>]}, > {messages, > [{'DOWN',#Ref<0.0.81.132266>,process,<0.10847.41>, > {timeout, > {gen_server,call, > [<0.12768.45>, > {write, > <<0,0,1,36,131,104,2,104,1,108,0,0,0,8,104, > 2,109,0,0,0,7,112,114,111,100,117,99,116, > 109,0,0,0,8,54,53,49,52,48,47,69,75,104, > 2,109,0,0,0,11,116,114,97,110,115,97,99, > 116,105,111,110,109,0,0,0,8,114,101,116, > 114,105,101,118,101,104,2,109,0,0,0,4, > 116,121,112,101,109,0,0,0,4,117,110,105, > 116,104,2,109,0,0,0,11,97,114,99,104,105, > 118,101,100,95,97,116,109,0,0,0,22,50,48, > 48,56,48,50,50,50,84,49,50,49,52,48,53, > 46,53,50,54,51,56,52,104,2,109,0,0,0,10, > 99,114,101,97,116,101,100,95,97,116,109, > 0,0,0,22,50,48,48,55,49,49,50,56,84,49, > 53,52,50,48,54,46,51,52,52,54,49,56,104, > 2,109,0,0,0,4,112,114,111,112,104,1,108, > 0,0,0,2,104,2,109,0,0,0,8,108,111,99,97, > 116,105,111,110,109,0,0,0,6,65,85,83,76, > 65,71,104,2,109,0,0,0,6,104,101,105,103, > 104,116,98,0,0,7,158,106,104,2,109,0,0,0, > 3,109,117,105,109,0,0,0,18,51,52,48,48, > 53,57,57,56,49,48,48,48,48,51,49,50,53, > 50,104,2,109,0,0,0,8,113,117,97,110,116, > 105,116,121,97,11,106,106>>}]}}}]}, > {links,[#Port<0.904494>]}, > {dictionary,[{<0.10847.41>,{#Ref<0.0.81.132266>,1}}]}, > {trap_exit,true}, > {status,running}, > {heap_size,987}, > {stack_size,23}, > {reductions,5627554}], > []]}} > (and nothing further) > I still can access couchdb1 (the source) but every trivial request takes exactly 5015ms: > balancer:/filespace/couchdb/log# time curl -i http://127.0.0.1:5984/ > HTTP/1.1 200 OK > Server: CouchDB/0.9.0a731357-incubating (Erlang OTP/R12B) > Date: Mon, 05 Jan 2009 20:45:46 GMT > Content-Type: text/plain;charset=utf-8 > Content-Length: 102 > Cache-Control: must-revalidate > {"couchdb":"Welcome","version":"0.9.0a731357-incubating","start_time":"Sun, 04 Jan 2009 21:43:13 GMT"} > real 0m5.015s > user 0m0.008s > sys 0m0.000s > For these accesses no log entries on couchdb1 are created. > Meanwhile on the destination server (couchdb2) I can see lot of activity: > [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19601.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... 40 lines... > [Mon, 05 Jan 2009 20:47:58 GMT] [info] [<0.19644.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19652.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... ca 200 lines > [Mon, 05 Jan 2009 20:48:13 GMT] [info] [<0.19744.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19747.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... ca 200 lines > [Mon, 05 Jan 2009 20:48:28 GMT] [info] [<0.19844.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > [Mon, 05 Jan 2009 20:48:43 GMT] [info] [<0.19944.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... ca 200 lines > [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.19948.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > ... ca 200 lines > [Mon, 05 Jan 2009 20:48:58 GMT] [info] [<0.20044.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > [Mon, 05 Jan 2009 20:49:13 GMT] [info] [<0.20045.5>] 172.28.4.107 - - 'POST' /hulog_events/_missing_revs 200 > But the number of documents on the destination servers hasnt been incerasing in the meantime: > {"db_name":"hulog_events","doc_count":25926,"doc_del_count":10074,"update_seq":36000,"purge_seq":0,"compact_running":false,"disk_size":21927524} > couchdb1 is CouchDB 0.9.0a731357-incubating > couchdb2 is CouchDB 0.9.0a730405-incubating -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.