Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3EB03DB6E for ; Thu, 4 Oct 2012 00:29:30 +0000 (UTC) Received: (qmail 37638 invoked by uid 500); 4 Oct 2012 00:29:29 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 37584 invoked by uid 500); 4 Oct 2012 00:29:29 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 37575 invoked by uid 99); 4 Oct 2012 00:29:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Oct 2012 00:29:29 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=HTML_MESSAGE,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of dustin@spy.net does not designate 69.230.8.158 as permitted sender) Received: from [69.230.8.158] (HELO z.west.spy.net) (69.230.8.158) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Oct 2012 00:29:25 +0000 Received: by z.west.spy.net (Postfix, from userid 25) id F26CFAE5; Wed, 3 Oct 2012 17:29:01 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eve.west.spy.net X-Spam-Level: Received: from dhcp-110.west.spy.net (dhcp-110.west.spy.net [192.168.1.110]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client did not present a certificate) by z.west.spy.net (Postfix) with ESMTPSA id A7CD4AE3 for ; Wed, 3 Oct 2012 17:28:56 -0700 (PDT) From: Dustin Sallings Content-Type: multipart/alternative; boundary="Apple-Mail=_A6410455-AF9E-4978-9A68-6AD7ACF4DB04" Subject: Fwd: replication problems Date: Wed, 3 Oct 2012 17:28:57 -0700 References: <2EC65984-7E61-4523-8CE0-372BCB898AE7@spy.net> To: "dev@couchdb.apache.org" Message-Id: <4A889776-299A-4F57-92A3-09CCC6AE6792@spy.net> Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) X-Mailer: Apple Mail (2.1498) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-3.1 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE, RP_MATCHES_RCVD autolearn=unavailable version=3.3.2 --Apple-Mail=_A6410455-AF9E-4978-9A68-6AD7ACF4DB04 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 I'm bringing this back up as requested. I'm currently = simultaneously in the "not replicating interesting things" and "has = duplicate replicates state". I think the stuff below shows the "not = replicating" stuff. Active tasks shows the other (these are based on replicator DB = documents (example below): [ { "checkpointed_source_seq": 2022317,=20 "continuous": true,=20 "doc_id": "cbstats-from-dogbowl",=20 "doc_write_failures": 0,=20 "docs_read": 300,=20 "docs_written": 300,=20 "missing_revisions_found": 300,=20 "pid": "<0.10466.12>",=20 "progress": 100,=20 "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",=20= "revisions_checked": 304,=20 "source": "http://dustin:*****@single.couchbase.net/cbstats/",=20= "source_seq": 2022317,=20 "started_on": 1349309457,=20 "target": "cbstats",=20 "type": "replication",=20 "updated_on": 1349310442 },=20 { "checkpointed_source_seq": 2022317,=20 "continuous": true,=20 "doc_id": "cbstats-from-dogbowl",=20 "doc_write_failures": 0,=20 "docs_read": 62,=20 "docs_written": 62,=20 "missing_revisions_found": 62,=20 "pid": "<0.11019.12>",=20 "progress": 100,=20 "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",=20= "revisions_checked": 304,=20 "source": "http://dustin:*****@single.couchbase.net/cbstats/",=20= "source_seq": 2022317,=20 "started_on": 1349309471,=20 "target": "cbstats",=20 "type": "replication",=20 "updated_on": 1349310443 },=20 { "checkpointed_source_seq": 107068,=20 "continuous": true,=20 "doc_id": "gerrit-from-prod",=20 "doc_write_failures": 0,=20 "docs_read": 22,=20 "docs_written": 22,=20 "missing_revisions_found": 22,=20 "pid": "<0.11086.12>",=20 "progress": 100,=20 "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",=20= "revisions_checked": 26,=20 "source": "http://dustinphoto.iriscouch.com/gerrit/",=20 "source_seq": 107068,=20 "started_on": 1349309487,=20 "target": "gerrit",=20 "type": "replication",=20 "updated_on": 1349310445 },=20 { "checkpointed_source_seq": 107068,=20 "continuous": true,=20 "doc_id": "gerrit-from-prod",=20 "doc_write_failures": 0,=20 "docs_read": 17,=20 "docs_written": 17,=20 "missing_revisions_found": 17,=20 "pid": "<0.11107.12>",=20 "progress": 100,=20 "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",=20= "revisions_checked": 26,=20 "source": "http://dustinphoto.iriscouch.com/gerrit/",=20 "source_seq": 107068,=20 "started_on": 1349309488,=20 "target": "gerrit",=20 "type": "replication",=20 "updated_on": 1349310445 } ] The replicator document for the latter, for example is this: { "_id": "gerrit-from-prod", "_rev": "2235-36de10fb757581a1782dacbb26ee4809", "source": "http://dustinphoto.iriscouch.com/gerrit", "target": "gerrit", "continuous": true, "user_ctx": { "roles": [ "_admin" ] }, "_replication_state_time": "2012-10-03T17:11:27-07:00", "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9", "_replication_state": "triggered" } Begin forwarded message: > From: Dustin Sallings > Subject: Re: replication problems > Date: June 15, 2012 0:10:04 PDT > To: dev@couchdb.apache.org > Reply-To: dev@couchdb.apache.org >=20 >=20 > On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote: >=20 >> Ar you using _replicate or _replicator ? Anything interresting in = logs? >=20 >=20 > I'm using _replicator (wonderful feature, I just kill the DB and = everything goes back the way I want it). >=20 > Hmm... I do think I found some stuff digging through the logs. = This is the local DB I noticed not doing its thing, although there were = tons of errors all around this. Looks like the server got into some = kind of bad state and sort of half-crashed. >=20 >=20 > [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication = `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> = `rpics-processed`) failed: = {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4 > 42.0>}]}}, > {gen_server,call, > [couch_server, > {open,<<"rpics">>, > = [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]}, > infinity]}} > [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server = <0.383.0> terminating=20 > ** Last message in was {'EXIT',<0.384.0>, > {{timeout, > {gen_server,call, > [<0.213.0>,{open_ref_count,<0.442.0>}]}}, > {gen_server,call, > [couch_server, > {open,<<"cbstats">>, > [{user_ctx, > = {user_ctx,null,[<<"_admin">>],undefined}}, > {user_ctx, > = {user_ctx,null,[<<"_admin">>],undefined}}]}, > infinity]}}} >=20 > ** When Server state =3D=3D {state,<0.272.0>,<0.384.0>,20, > {httpdb, > = "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/", > nil, > [{"Accept","application/json"}, > {"User-Agent","CouchDB/1.2.0"}], > 30000, > [{socket_options, > [{keepalive,true},{nodelay,false}]}], > 10,250,<0.273.0>,20}, > = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>, > <0.290.0>,<0.286.0>,<0.367.0>, > {db_header,6,984356,0, > {860345646,{737369,975,640891414},59433736}, > {860348005,738344,42056446}, > {860352635,[],5737}, > 0,nil,nil,1000}, > 984356, > {btree,<0.286.0>, > {860345646,{737369,975,640891414},59433736}, > #Fun, > #Fun, > #Fun, > #Fun,snappy}, > {btree,<0.286.0>, > {860348005,738344,42056446}, > #Fun, > #Fun, > #Fun, > #Fun,snappy}, > {btree,<0.286.0>, > {860352635,[],5737}, > #Fun, > #Fun, > #Fun,nil,snappy}, > 984356,<<"cbstats">>, > = "/Volumes/terror/db/couchdb/cbstats.couch",[],[], > nil, > {user_ctx,null,[<<"_admin">>],undefined}, > nil,1000, > [before_header,after_header,on_file_open], > [{user_ctx, > {user_ctx,null,[<<"_admin">>],undefined}}], > snappy,nil,nil}, > [],nil,nil,nil, > {rep_stats,0,0,0,0,0}, > nil,<0.385.0>, > {batch,[],0}} > ** Reason for termination =3D=3D=20 > ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}} >=20 >=20 >=20 >=20 > Scrolling to the beginning of the errors, I find this: >=20 >=20 > [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication = `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> = `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: = source_db_down > [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET = /_all_dbs 200 > [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server = <0.289.0> terminating=20 > ** Last message in was {update_docs,<0.272.0>,[], > [{{doc, > = <<"_local/c4cc070f896d7267e52ba012856fed4b">>, > {0,[<<"346185">>]}, > {[{<<"session_id">>, > = <<"9fb3475683d44bb1e151031dd42cc59f">>}, > {<<"source_last_seq">>,1419004}, > {<<"replication_id_version">>,2}, > {<<"history">>, > [{[{<<"session_id">>, > = <<"9fb3475683d44bb1e151031dd42cc59f">>}, > {<<"start_time">>, > <<"Thu, 14 Jun 2012 01:35:02 = GMT">>}, > {<<"end_time">>, > <<"Thu, 14 Jun 2012 23:15:29 = GMT">>}, > {<<"start_last_seq">>,1410146}, > {<<"end_last_seq">>,1419004}, > {<<"recorded_seq">>,1419004}, > {<<"missing_checked">>,8100}, > {<<"missing_found">>,8100}, > {<<"docs_read">>,8100}, > {<<"docs_written">>,8100}, > {<<"doc_write_failures">>,0}]}, > {[{<<"session_id">>, > = <<"3edd7c50327eab7ec0768451e34efa8b">>}, > {<<"start_time">>, > <<"Tue, 12 Jun 2012 05:51:17 = GMT">>}, > {<<"end_time">>, > <<"Tue, 12 Jun 2012 13:02:37 = GMT">>}, > {<<"start_last_seq">>,1407186}, > {<<"end_last_seq">>,1410146}, > {<<"recorded_seq">>,1410146}, > {<<"missing_checked">>,2583}, > {<<"missing_found">>,2577}, > {<<"docs_read">>,2577}, > {<<"docs_written">>,2577}, > {<<"doc_write_failures">>,0}]}, > {[{<<"session_id">>, > = <<"172de62044281a01b1584a9d099f42af">>}, > {<<"start_time">>, > <<"Mon, 11 Jun 2012 03:40:11 = GMT">>}, > {<<"end_time">>, > <<"Mon, 11 Jun 2012 15:16:24 = GMT">>}, > {<<"start_last_seq">>,1405428}, > {<<"end_last_seq">>,1407186}, > {<<"recorded_seq">>,1407186}, > {<<"missing_checked">>,1721}, > {<<"missing_found">>,1721}, > {<<"docs_read">>,1721}, > {<<"docs_written">>,1721}, > {<<"doc_write_failures">>,0}]}, > {[{<<"session_id">>, > = <<"e60a126a2036c5fab00a1249101820c8">>}, > {<<"start_time">>, > <<"Sat, 09 Jun 2012 07:47:22 = GMT">>}, > {<<"end_time">>, > <<"Sun, 10 Jun 2012 21:16:20 = GMT">>}, > {<<"start_last_seq">>,1386289}, > {<<"end_last_seq">>,1405428}, > {<<"recorded_seq">>,1405428}, > {<<"missing_checked">>,16977}, > {<<"missing_found">>,16977}, > {<<"docs_read">>,16977}, > {<<"docs_written">>,16977}, > {<<"doc_write_failures">>,0}]}, > {[{<<"session_id">>, > = <<"ef3e4333d340dcf73ddfa3fe8c720042">>}, > {<<"start_time">>, > <<"Mon, 04 Jun 2012 02:39:44 = GMT">>}, > {<<"end_time">>, > <<"Mon, 04 Jun 2012 12:35:50 = GMT">>}, > {<<"start_last_seq">>,1384738}, > {<<"end_last_seq">>,1386289}, > {<<"recorded_seq">>,1386289}, > {<<"missing_checked">>,1551}, > {<<"missing_found">>,1550}, > {<<"docs_read">>,1550}, > {<<"docs_written">>,1550}, > {<<"doc_write_failures">>,0}]}, > {[{<<"session_id">>, > = <<"d5123a3caf462794aaf5a47be1bb3b6e">>}, > {<<"start_time">>, > <<"Wed, 30 May 2012 20:41:43 = GMT">>}, > {<<"end_time">>, > <<"Mon, 04 Jun 2012 02:37:33 = GMT">>}, > {<<"start_last_seq">>,1372404}, > {<<"end_last_seq">>,1384738}, > {<<"recorded_seq">>,1384738}, > {<<"missing_checked">>,12334}, > {<<"missing_found">>,12333}, > {<<"docs_read">>,12333}, > {<<"docs_written">>,12333}, > {<<"doc_write_failures">>,0}]}, > {[{<<"session_id">>, > = <<"52a16e8832f70dc094f6fff5e9b7d75b">>}, > {<<"start_time">>, > <<"Sun, 27 May 2012 23:36:41 = GMT">>}, > {<<"end_time">>, > <<"Wed, 30 May 2012 20:40:14 = GMT">>}, > {<<"start_last_seq">>,1361049}, > {<<"end_last_seq">>,1372404}, > {<<"recorded_seq">>,1372404}, > {<<"missing_checked">>,11355}, > {<<"missing_found">>,11355}, > {<<"docs_read">>,11355}, > {<<"docs_written">>,11355}, > {<<"doc_write_failures">>,0}]}, > [...lots of these...] >=20 > [],false,[]}, > #Ref<0.0.15.159973>}], > false,false} > ** When Server state =3D=3D = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>, > <0.290.0>,<0.286.0>,<0.367.0>, > {db_header,6,992456,0, > = {943280145,{744250,975,647546641},60017672}, > {943282327,745225,42485979}, > {943267963,[],5753}, > 0,nil,nil,1000}, > 992456, > {btree,<0.286.0>, > = {943280145,{744250,975,647546641},60017672}, > #Fun, > #Fun, > #Fun, > = #Fun,snappy}, > {btree,<0.286.0>, > {943282327,745225,42485979}, > #Fun, > #Fun, > #Fun, > = #Fun,snappy}, > {btree,<0.286.0>, > {943267963,[],5753}, > #Fun, > #Fun, > = #Fun,nil,snappy}, > 992456,<<"cbstats">>, > = "/Volumes/terror/db/couchdb/cbstats.couch",[],[], > nil, > {user_ctx,null,[],undefined}, > nil,1000, > [before_header,after_header,on_file_open], > [{user_ctx, > = {user_ctx,null,[<<"_admin">>],undefined}}], > snappy,nil,nil} > ** Reason for termination =3D=3D=20 > ** {timeout, > {gen_server,call, > [<0.288.0>, > {db_updated, > = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>, > <0.286.0>,<0.367.0>, > {db_header,6,992456,0, > {943280145,{744250,975,647546641},60017672}, > {943282327,745225,42485979}, > {943267963,[],5753}, > 0,nil,nil,1000}, > 992456, > {btree,<0.286.0>, > {943280145,{744250,975,647546641},60017672}, > #Fun, > #Fun, > #Fun, > #Fun,snappy}, > {btree,<0.286.0>, > {943282327,745225,42485979}, > #Fun, > #Fun, > #Fun, > #Fun,snappy}, > {btree,<0.286.0>, > {943284347,[],5756}, > #Fun, > #Fun, > #Fun,nil,snappy}, > 992456,<<"cbstats">>, > = "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil, > {user_ctx,null,[],undefined}, > #Ref<0.0.15.160107>,1000, > [before_header,after_header,on_file_open], > = [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}], > snappy,nil,nil}}]}} >=20 >=20 >=20 >=20 > --=20 > dustin sallings >=20 >=20 >=20 --=20 dustin sallings --Apple-Mail=_A6410455-AF9E-4978-9A68-6AD7ACF4DB04--