Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C2D0392F4 for ; Fri, 15 Jun 2012 07:10:41 +0000 (UTC) Received: (qmail 51680 invoked by uid 500); 15 Jun 2012 07:10:41 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 51438 invoked by uid 500); 15 Jun 2012 07:10:41 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 51410 invoked by uid 99); 15 Jun 2012 07:10:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2012 07:10:40 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=HTML_MESSAGE,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of dustin@spy.net does not designate 69.230.8.158 as permitted sender) Received: from [69.230.8.158] (HELO z.west.spy.net) (69.230.8.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2012 07:10:32 +0000 Received: by z.west.spy.net (Postfix, from userid 25) id A797FD7B; Fri, 15 Jun 2012 00:10:08 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eve.west.spy.net X-Spam-Level: Received: from dhcp-110.west.spy.net (dhcp-110.west.spy.net [192.168.1.110]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client did not present a certificate) by z.west.spy.net (Postfix) with ESMTPSA id C9A0FD79 for ; Fri, 15 Jun 2012 00:10:04 -0700 (PDT) From: Dustin Sallings Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: multipart/alternative; boundary="Apple-Mail=_B0CD70A8-AD83-4C14-853C-AA7A5051315A" Subject: Re: replication problems Date: Fri, 15 Jun 2012 00:10:04 -0700 In-Reply-To: To: dev@couchdb.apache.org References: <01C00F77-5275-4BD4-A9AF-004E8723943E@spy.net> Message-Id: <2EC65984-7E61-4523-8CE0-372BCB898AE7@spy.net> X-Mailer: Apple Mail (2.1278) X-Old-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 --Apple-Mail=_B0CD70A8-AD83-4C14-853C-AA7A5051315A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote: > Ar you using _replicate or _replicator ? Anything interresting in = logs? I'm using _replicator (wonderful feature, I just kill the DB and = everything goes back the way I want it). Hmm... I do think I found some stuff digging through the logs. = This is the local DB I noticed not doing its thing, although there were = tons of errors all around this. Looks like the server got into some = kind of bad state and sort of half-crashed. [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication = `ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> = `rpics-processed`) failed: = {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4 42.0>}]}}, {gen_server,call, [couch_server, {open,<<"rpics">>, = [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]}, infinity]}} [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic server = <0.383.0> terminating=20 ** Last message in was {'EXIT',<0.384.0>, {{timeout, {gen_server,call, [<0.213.0>,{open_ref_count,<0.442.0>}]}}, {gen_server,call, [couch_server, {open,<<"cbstats">>, [{user_ctx, {user_ctx,null,[<<"_admin">>],undefined}}, {user_ctx, = {user_ctx,null,[<<"_admin">>],undefined}}]}, infinity]}}} ** When Server state =3D=3D {state,<0.272.0>,<0.384.0>,20, {httpdb, = "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/", nil, [{"Accept","application/json"}, {"User-Agent","CouchDB/1.2.0"}], 30000, [{socket_options, [{keepalive,true},{nodelay,false}]}], 10,250,<0.273.0>,20}, = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>, <0.290.0>,<0.286.0>,<0.367.0>, {db_header,6,984356,0, {860345646,{737369,975,640891414},59433736}, {860348005,738344,42056446}, {860352635,[],5737}, 0,nil,nil,1000}, 984356, {btree,<0.286.0>, {860345646,{737369,975,640891414},59433736}, #Fun, #Fun, #Fun, #Fun,snappy}, {btree,<0.286.0>, {860348005,738344,42056446}, #Fun, #Fun, #Fun, #Fun,snappy}, {btree,<0.286.0>, {860352635,[],5737}, #Fun, #Fun, #Fun,nil,snappy}, 984356,<<"cbstats">>, = "/Volumes/terror/db/couchdb/cbstats.couch",[],[], nil, {user_ctx,null,[<<"_admin">>],undefined}, nil,1000, [before_header,after_header,on_file_open], [{user_ctx, {user_ctx,null,[<<"_admin">>],undefined}}], snappy,nil,nil}, [],nil,nil,nil, {rep_stats,0,0,0,0,0}, nil,<0.385.0>, {batch,[],0}} ** Reason for termination =3D=3D=20 ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}} Scrolling to the beginning of the errors, I find this: [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication = `543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> = `http://dustin:*****@dustinphoto.couchone.com/photo/`) failed: = source_db_down [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 - - GET = /_all_dbs 200 [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic server = <0.289.0> terminating=20 ** Last message in was {update_docs,<0.272.0>,[], [{{doc, = <<"_local/c4cc070f896d7267e52ba012856fed4b">>, {0,[<<"346185">>]}, {[{<<"session_id">>, = <<"9fb3475683d44bb1e151031dd42cc59f">>}, {<<"source_last_seq">>,1419004}, {<<"replication_id_version">>,2}, {<<"history">>, [{[{<<"session_id">>, = <<"9fb3475683d44bb1e151031dd42cc59f">>}, {<<"start_time">>, <<"Thu, 14 Jun 2012 01:35:02 = GMT">>}, {<<"end_time">>, <<"Thu, 14 Jun 2012 23:15:29 = GMT">>}, {<<"start_last_seq">>,1410146}, {<<"end_last_seq">>,1419004}, {<<"recorded_seq">>,1419004}, {<<"missing_checked">>,8100}, {<<"missing_found">>,8100}, {<<"docs_read">>,8100}, {<<"docs_written">>,8100}, {<<"doc_write_failures">>,0}]}, {[{<<"session_id">>, = <<"3edd7c50327eab7ec0768451e34efa8b">>}, {<<"start_time">>, <<"Tue, 12 Jun 2012 05:51:17 = GMT">>}, {<<"end_time">>, <<"Tue, 12 Jun 2012 13:02:37 = GMT">>}, {<<"start_last_seq">>,1407186}, {<<"end_last_seq">>,1410146}, {<<"recorded_seq">>,1410146}, {<<"missing_checked">>,2583}, {<<"missing_found">>,2577}, {<<"docs_read">>,2577}, {<<"docs_written">>,2577}, {<<"doc_write_failures">>,0}]}, {[{<<"session_id">>, = <<"172de62044281a01b1584a9d099f42af">>}, {<<"start_time">>, <<"Mon, 11 Jun 2012 03:40:11 = GMT">>}, {<<"end_time">>, <<"Mon, 11 Jun 2012 15:16:24 = GMT">>}, {<<"start_last_seq">>,1405428}, {<<"end_last_seq">>,1407186}, {<<"recorded_seq">>,1407186}, {<<"missing_checked">>,1721}, {<<"missing_found">>,1721}, {<<"docs_read">>,1721}, {<<"docs_written">>,1721}, {<<"doc_write_failures">>,0}]}, {[{<<"session_id">>, = <<"e60a126a2036c5fab00a1249101820c8">>}, {<<"start_time">>, <<"Sat, 09 Jun 2012 07:47:22 = GMT">>}, {<<"end_time">>, <<"Sun, 10 Jun 2012 21:16:20 = GMT">>}, {<<"start_last_seq">>,1386289}, {<<"end_last_seq">>,1405428}, {<<"recorded_seq">>,1405428}, {<<"missing_checked">>,16977}, {<<"missing_found">>,16977}, {<<"docs_read">>,16977}, {<<"docs_written">>,16977}, {<<"doc_write_failures">>,0}]}, {[{<<"session_id">>, = <<"ef3e4333d340dcf73ddfa3fe8c720042">>}, {<<"start_time">>, <<"Mon, 04 Jun 2012 02:39:44 = GMT">>}, {<<"end_time">>, <<"Mon, 04 Jun 2012 12:35:50 = GMT">>}, {<<"start_last_seq">>,1384738}, {<<"end_last_seq">>,1386289}, {<<"recorded_seq">>,1386289}, {<<"missing_checked">>,1551}, {<<"missing_found">>,1550}, {<<"docs_read">>,1550}, {<<"docs_written">>,1550}, {<<"doc_write_failures">>,0}]}, {[{<<"session_id">>, = <<"d5123a3caf462794aaf5a47be1bb3b6e">>}, {<<"start_time">>, <<"Wed, 30 May 2012 20:41:43 = GMT">>}, {<<"end_time">>, <<"Mon, 04 Jun 2012 02:37:33 = GMT">>}, {<<"start_last_seq">>,1372404}, {<<"end_last_seq">>,1384738}, {<<"recorded_seq">>,1384738}, {<<"missing_checked">>,12334}, {<<"missing_found">>,12333}, {<<"docs_read">>,12333}, {<<"docs_written">>,12333}, {<<"doc_write_failures">>,0}]}, {[{<<"session_id">>, = <<"52a16e8832f70dc094f6fff5e9b7d75b">>}, {<<"start_time">>, <<"Sun, 27 May 2012 23:36:41 = GMT">>}, {<<"end_time">>, <<"Wed, 30 May 2012 20:40:14 = GMT">>}, {<<"start_last_seq">>,1361049}, {<<"end_last_seq">>,1372404}, {<<"recorded_seq">>,1372404}, {<<"missing_checked">>,11355}, {<<"missing_found">>,11355}, {<<"docs_read">>,11355}, {<<"docs_written">>,11355}, {<<"doc_write_failures">>,0}]}, [...lots of these...] [],false,[]}, #Ref<0.0.15.159973>}], false,false} ** When Server state =3D=3D = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>, <0.290.0>,<0.286.0>,<0.367.0>, {db_header,6,992456,0, = {943280145,{744250,975,647546641},60017672}, {943282327,745225,42485979}, {943267963,[],5753}, 0,nil,nil,1000}, 992456, {btree,<0.286.0>, = {943280145,{744250,975,647546641},60017672}, #Fun, #Fun, #Fun, = #Fun,snappy}, {btree,<0.286.0>, {943282327,745225,42485979}, #Fun, #Fun, #Fun, = #Fun,snappy}, {btree,<0.286.0>, {943267963,[],5753}, #Fun, #Fun, = #Fun,nil,snappy}, 992456,<<"cbstats">>, = "/Volumes/terror/db/couchdb/cbstats.couch",[],[], nil, {user_ctx,null,[],undefined}, nil,1000, [before_header,after_header,on_file_open], [{user_ctx, = {user_ctx,null,[<<"_admin">>],undefined}}], snappy,nil,nil} ** Reason for termination =3D=3D=20 ** {timeout, {gen_server,call, [<0.288.0>, {db_updated, = {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>, <0.286.0>,<0.367.0>, {db_header,6,992456,0, {943280145,{744250,975,647546641},60017672}, {943282327,745225,42485979}, {943267963,[],5753}, 0,nil,nil,1000}, 992456, {btree,<0.286.0>, {943280145,{744250,975,647546641},60017672}, #Fun, #Fun, #Fun, #Fun,snappy}, {btree,<0.286.0>, {943282327,745225,42485979}, #Fun, #Fun, #Fun, #Fun,snappy}, {btree,<0.286.0>, {943284347,[],5756}, #Fun, #Fun, #Fun,nil,snappy}, 992456,<<"cbstats">>, = "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil, {user_ctx,null,[],undefined}, #Ref<0.0.15.160107>,1000, [before_header,after_header,on_file_open], = [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}], snappy,nil,nil}}]}} --=20 dustin sallings --Apple-Mail=_B0CD70A8-AD83-4C14-853C-AA7A5051315A--