Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DB7CF9FF6 for ; Wed, 10 Oct 2012 20:12:41 +0000 (UTC) Received: (qmail 17673 invoked by uid 500); 10 Oct 2012 20:12:41 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 17631 invoked by uid 500); 10 Oct 2012 20:12:41 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 17622 invoked by uid 99); 10 Oct 2012 20:12:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Oct 2012 20:12:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [80.244.253.218] (HELO mail.traeumt.net) (80.244.253.218) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Oct 2012 20:12:33 +0000 Received: from [10.0.0.10] (91-64-36-132-dynip.superkabel.de [91.64.36.132]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.traeumt.net (Postfix) with ESMTPSA id 529D8140B9 for ; Wed, 10 Oct 2012 22:05:54 +0200 (CEST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Subject: Re: replication problems From: Jan Lehnardt In-Reply-To: <4A889776-299A-4F57-92A3-09CCC6AE6792@spy.net> Date: Wed, 10 Oct 2012 22:12:09 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <3E55AC16-BBD6-4CBA-BB7F-F5AC2FCE566F@apache.org> References: <2EC65984-7E61-4523-8CE0-372BCB898AE7@spy.net> <4A889776-299A-4F57-92A3-09CCC6AE6792@spy.net> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1498) Filipe tells me this is = https://issues.apache.org/jira/browse/COUCHDB-1259 Cheers Jan -- On Oct 4, 2012, at 02:28 , Dustin Sallings wrote: >=20 > I'm bringing this back up as requested. I'm currently = simultaneously in the "not replicating interesting things" and "has = duplicate replicates state". I think the stuff below shows the "not = replicating" stuff. >=20 > Active tasks shows the other (these are based on replicator DB = documents (example below): >=20 > [ > { > "checkpointed_source_seq": 2022317,=20 > "continuous": true,=20 > "doc_id": "cbstats-from-dogbowl",=20 > "doc_write_failures": 0,=20 > "docs_read": 300,=20 > "docs_written": 300,=20 > "missing_revisions_found": 300,=20 > "pid": "<0.10466.12>",=20 > "progress": 100,=20 > "replication_id": = "50daecd0a29f4b7e5d102990831f3d64+continuous",=20 > "revisions_checked": 304,=20 > "source": "http://dustin:*****@single.couchbase.net/cbstats/",=20= > "source_seq": 2022317,=20 > "started_on": 1349309457,=20 > "target": "cbstats",=20 > "type": "replication",=20 > "updated_on": 1349310442 > },=20 > { > "checkpointed_source_seq": 2022317,=20 > "continuous": true,=20 > "doc_id": "cbstats-from-dogbowl",=20 > "doc_write_failures": 0,=20 > "docs_read": 62,=20 > "docs_written": 62,=20 > "missing_revisions_found": 62,=20 > "pid": "<0.11019.12>",=20 > "progress": 100,=20 > "replication_id": = "411e341d5aa9a3fe636cf4ea8ba71720+continuous",=20 > "revisions_checked": 304,=20 > "source": "http://dustin:*****@single.couchbase.net/cbstats/",=20= > "source_seq": 2022317,=20 > "started_on": 1349309471,=20 > "target": "cbstats",=20 > "type": "replication",=20 > "updated_on": 1349310443 > },=20 > { > "checkpointed_source_seq": 107068,=20 > "continuous": true,=20 > "doc_id": "gerrit-from-prod",=20 > "doc_write_failures": 0,=20 > "docs_read": 22,=20 > "docs_written": 22,=20 > "missing_revisions_found": 22,=20 > "pid": "<0.11086.12>",=20 > "progress": 100,=20 > "replication_id": = "4a21031dac0d81637a23c32bad620be9+continuous",=20 > "revisions_checked": 26,=20 > "source": "http://dustinphoto.iriscouch.com/gerrit/",=20 > "source_seq": 107068,=20 > "started_on": 1349309487,=20 > "target": "gerrit",=20 > "type": "replication",=20 > "updated_on": 1349310445 > },=20 > { > "checkpointed_source_seq": 107068,=20 > "continuous": true,=20 > "doc_id": "gerrit-from-prod",=20 > "doc_write_failures": 0,=20 > "docs_read": 17,=20 > "docs_written": 17,=20 > "missing_revisions_found": 17,=20 > "pid": "<0.11107.12>",=20 > "progress": 100,=20 > "replication_id": = "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",=20 > "revisions_checked": 26,=20 > "source": "http://dustinphoto.iriscouch.com/gerrit/",=20 > "source_seq": 107068,=20 > "started_on": 1349309488,=20 > "target": "gerrit",=20 > "type": "replication",=20 > "updated_on": 1349310445 > } > ] >=20 >=20 > The replicator document for the latter, for example is this: >=20 > { > "_id": "gerrit-from-prod", > "_rev": "2235-36de10fb757581a1782dacbb26ee4809", > "source": "http://dustinphoto.iriscouch.com/gerrit", > "target": "gerrit", > "continuous": true, > "user_ctx": { > "roles": [ > "_admin" > ] > }, > "_replication_state_time": "2012-10-03T17:11:27-07:00", > "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9", > "_replication_state": "triggered" > } >=20 >=20 > Begin forwarded message: >=20 >> From: Dustin Sallings >> Subject: Re: replication problems >> Date: June 15, 2012 0:10:04 PDT >> To: dev@couchdb.apache.org >> Reply-To: dev@couchdb.apache.org >>=20 >>=20 >> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote: >>=20 >>> Ar you using _replicate or _replicator ? Anything interresting in = logs? >>=20 >>=20 >> I'm using _replicator (wonderful feature, I just kill the DB and = everything goes back the way I want it). >>=20 >> Hmm... I do think I found some stuff digging through the logs. = This is the local DB I noticed not doing its thing, although there were = tons of errors all around this. Looks like the server got into some = kind of bad state and sort of half-crashed. >>=20 >>=20 >> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication = `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> = `rpics-processed`) failed: = {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4 >> 42.0>}]}}, >> {gen_server,call, >> [couch_server, >> {open,<<"rpics">>, >> = [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]}, >> infinity]}} >> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server = <0.383.0> terminating=20 >> ** Last message in was {'EXIT',<0.384.0>, >> {{timeout, >> {gen_server,call, >> [<0.213.0>,{open_ref_count,<0.442.0>}]}}, >> {gen_server,call, >> [couch_server, >> {open,<<"cbstats">>, >> [{user_ctx, >> = {user_ctx,null,[<<"_admin">>],undefined}}, >> {user_ctx, >> = {user_ctx,null,[<<"_admin">>],undefined}}]}, >> infinity]}}} >>=20 >> ** When Server state =3D=3D {state,<0.272.0>,<0.384.0>,20, >> {httpdb, >> = "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/", >> nil, >> [{"Accept","application/json"}, >> {"User-Agent","CouchDB/1.2.0"}], >> 30000, >> [{socket_options, >> [{keepalive,true},{nodelay,false}]}], >> 10,250,<0.273.0>,20}, >> = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>, >> <0.290.0>,<0.286.0>,<0.367.0>, >> {db_header,6,984356,0, >> {860345646,{737369,975,640891414},59433736}, >> {860348005,738344,42056446}, >> {860352635,[],5737}, >> 0,nil,nil,1000}, >> 984356, >> {btree,<0.286.0>, >> {860345646,{737369,975,640891414},59433736}, >> #Fun, >> #Fun, >> #Fun, >> #Fun,snappy}, >> {btree,<0.286.0>, >> {860348005,738344,42056446}, >> #Fun, >> #Fun, >> #Fun, >> #Fun,snappy}, >> {btree,<0.286.0>, >> {860352635,[],5737}, >> #Fun, >> #Fun, >> #Fun,nil,snappy}, >> 984356,<<"cbstats">>, >> = "/Volumes/terror/db/couchdb/cbstats.couch",[],[], >> nil, >> {user_ctx,null,[<<"_admin">>],undefined}, >> nil,1000, >> [before_header,after_header,on_file_open], >> [{user_ctx, >> {user_ctx,null,[<<"_admin">>],undefined}}], >> snappy,nil,nil}, >> [],nil,nil,nil, >> {rep_stats,0,0,0,0,0}, >> nil,<0.385.0>, >> {batch,[],0}} >> ** Reason for termination =3D=3D=20 >> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}} >>=20 >>=20 >>=20 >>=20 >> Scrolling to the beginning of the errors, I find this: >>=20 >>=20 >> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication = `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> = `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: = source_db_down >> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET = /_all_dbs 200 >> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server = <0.289.0> terminating=20 >> ** Last message in was {update_docs,<0.272.0>,[], >> [{{doc, >> = <<"_local/c4cc070f896d7267e52ba012856fed4b">>, >> {0,[<<"346185">>]}, >> {[{<<"session_id">>, >> = <<"9fb3475683d44bb1e151031dd42cc59f">>}, >> {<<"source_last_seq">>,1419004}, >> {<<"replication_id_version">>,2}, >> {<<"history">>, >> [{[{<<"session_id">>, >> = <<"9fb3475683d44bb1e151031dd42cc59f">>}, >> {<<"start_time">>, >> <<"Thu, 14 Jun 2012 01:35:02 = GMT">>}, >> {<<"end_time">>, >> <<"Thu, 14 Jun 2012 23:15:29 = GMT">>}, >> {<<"start_last_seq">>,1410146}, >> {<<"end_last_seq">>,1419004}, >> {<<"recorded_seq">>,1419004}, >> {<<"missing_checked">>,8100}, >> {<<"missing_found">>,8100}, >> {<<"docs_read">>,8100}, >> {<<"docs_written">>,8100}, >> {<<"doc_write_failures">>,0}]}, >> {[{<<"session_id">>, >> = <<"3edd7c50327eab7ec0768451e34efa8b">>}, >> {<<"start_time">>, >> <<"Tue, 12 Jun 2012 05:51:17 = GMT">>}, >> {<<"end_time">>, >> <<"Tue, 12 Jun 2012 13:02:37 = GMT">>}, >> {<<"start_last_seq">>,1407186}, >> {<<"end_last_seq">>,1410146}, >> {<<"recorded_seq">>,1410146}, >> {<<"missing_checked">>,2583}, >> {<<"missing_found">>,2577}, >> {<<"docs_read">>,2577}, >> {<<"docs_written">>,2577}, >> {<<"doc_write_failures">>,0}]}, >> {[{<<"session_id">>, >> = <<"172de62044281a01b1584a9d099f42af">>}, >> {<<"start_time">>, >> <<"Mon, 11 Jun 2012 03:40:11 = GMT">>}, >> {<<"end_time">>, >> <<"Mon, 11 Jun 2012 15:16:24 = GMT">>}, >> {<<"start_last_seq">>,1405428}, >> {<<"end_last_seq">>,1407186}, >> {<<"recorded_seq">>,1407186}, >> {<<"missing_checked">>,1721}, >> {<<"missing_found">>,1721}, >> {<<"docs_read">>,1721}, >> {<<"docs_written">>,1721}, >> {<<"doc_write_failures">>,0}]}, >> {[{<<"session_id">>, >> = <<"e60a126a2036c5fab00a1249101820c8">>}, >> {<<"start_time">>, >> <<"Sat, 09 Jun 2012 07:47:22 = GMT">>}, >> {<<"end_time">>, >> <<"Sun, 10 Jun 2012 21:16:20 = GMT">>}, >> {<<"start_last_seq">>,1386289}, >> {<<"end_last_seq">>,1405428}, >> {<<"recorded_seq">>,1405428}, >> {<<"missing_checked">>,16977}, >> {<<"missing_found">>,16977}, >> {<<"docs_read">>,16977}, >> {<<"docs_written">>,16977}, >> {<<"doc_write_failures">>,0}]}, >> {[{<<"session_id">>, >> = <<"ef3e4333d340dcf73ddfa3fe8c720042">>}, >> {<<"start_time">>, >> <<"Mon, 04 Jun 2012 02:39:44 = GMT">>}, >> {<<"end_time">>, >> <<"Mon, 04 Jun 2012 12:35:50 = GMT">>}, >> {<<"start_last_seq">>,1384738}, >> {<<"end_last_seq">>,1386289}, >> {<<"recorded_seq">>,1386289}, >> {<<"missing_checked">>,1551}, >> {<<"missing_found">>,1550}, >> {<<"docs_read">>,1550}, >> {<<"docs_written">>,1550}, >> {<<"doc_write_failures">>,0}]}, >> {[{<<"session_id">>, >> = <<"d5123a3caf462794aaf5a47be1bb3b6e">>}, >> {<<"start_time">>, >> <<"Wed, 30 May 2012 20:41:43 = GMT">>}, >> {<<"end_time">>, >> <<"Mon, 04 Jun 2012 02:37:33 = GMT">>}, >> {<<"start_last_seq">>,1372404}, >> {<<"end_last_seq">>,1384738}, >> {<<"recorded_seq">>,1384738}, >> {<<"missing_checked">>,12334}, >> {<<"missing_found">>,12333}, >> {<<"docs_read">>,12333}, >> {<<"docs_written">>,12333}, >> {<<"doc_write_failures">>,0}]}, >> {[{<<"session_id">>, >> = <<"52a16e8832f70dc094f6fff5e9b7d75b">>}, >> {<<"start_time">>, >> <<"Sun, 27 May 2012 23:36:41 = GMT">>}, >> {<<"end_time">>, >> <<"Wed, 30 May 2012 20:40:14 = GMT">>}, >> {<<"start_last_seq">>,1361049}, >> {<<"end_last_seq">>,1372404}, >> {<<"recorded_seq">>,1372404}, >> {<<"missing_checked">>,11355}, >> {<<"missing_found">>,11355}, >> {<<"docs_read">>,11355}, >> {<<"docs_written">>,11355}, >> {<<"doc_write_failures">>,0}]}, >> [...lots of these...] >>=20 >> [],false,[]}, >> #Ref<0.0.15.159973>}], >> false,false} >> ** When Server state =3D=3D = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>, >> <0.290.0>,<0.286.0>,<0.367.0>, >> {db_header,6,992456,0, >> = {943280145,{744250,975,647546641},60017672}, >> {943282327,745225,42485979}, >> {943267963,[],5753}, >> 0,nil,nil,1000}, >> 992456, >> {btree,<0.286.0>, >> = {943280145,{744250,975,647546641},60017672}, >> #Fun, >> #Fun, >> #Fun, >> = #Fun,snappy}, >> {btree,<0.286.0>, >> {943282327,745225,42485979}, >> #Fun, >> #Fun, >> #Fun, >> = #Fun,snappy}, >> {btree,<0.286.0>, >> {943267963,[],5753}, >> #Fun, >> #Fun, >> = #Fun,nil,snappy}, >> 992456,<<"cbstats">>, >> = "/Volumes/terror/db/couchdb/cbstats.couch",[],[], >> nil, >> {user_ctx,null,[],undefined}, >> nil,1000, >> [before_header,after_header,on_file_open], >> [{user_ctx, >> = {user_ctx,null,[<<"_admin">>],undefined}}], >> snappy,nil,nil} >> ** Reason for termination =3D=3D=20 >> ** {timeout, >> {gen_server,call, >> [<0.288.0>, >> {db_updated, >> = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>, >> <0.286.0>,<0.367.0>, >> {db_header,6,992456,0, >> {943280145,{744250,975,647546641},60017672}, >> {943282327,745225,42485979}, >> {943267963,[],5753}, >> 0,nil,nil,1000}, >> 992456, >> {btree,<0.286.0>, >> {943280145,{744250,975,647546641},60017672}, >> #Fun, >> #Fun, >> #Fun, >> #Fun,snappy}, >> {btree,<0.286.0>, >> {943282327,745225,42485979}, >> #Fun, >> #Fun, >> #Fun, >> #Fun,snappy}, >> {btree,<0.286.0>, >> {943284347,[],5756}, >> #Fun, >> #Fun, >> #Fun,nil,snappy}, >> 992456,<<"cbstats">>, >> = "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil, >> {user_ctx,null,[],undefined}, >> #Ref<0.0.15.160107>,1000, >> [before_header,after_header,on_file_open], >> = [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}], >> snappy,nil,nil}}]}} >>=20 >>=20 >>=20 >>=20 >> --=20 >> dustin sallings >>=20 >>=20 >>=20 >=20 > --=20 > dustin sallings >=20 >=20 >=20