incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Dionne <dio...@dionne-associates.com>
Subject Re: replication problems
Date Thu, 11 Oct 2012 13:58:40 GMT
Incorporating a unique id from the source and target seems like a good way to go but I'm wondering
if an id from an ini file will
work in the clustered BigCouch case. Would an API level request work better? Something the
replicator would interrogate
for both the source and the target.


On Oct 11, 2012, at 5:42 AM, Robert Newson <robert.newson@gmail.com> wrote:

> I'll note here that the attached patch is wrong. It uses a single uuid
> from the node running replication, which might not be the source or
> target. Instead, the uuid of source and target must be retrieved and
> used instead of the host:port. Jason's suggestion to add the uuid
> (stored in the ini file) to the welcome message sounds really good to
> me.
> 
> Can't attach this to the ticket today as I don't have my Jira creds.
> 
> Sent from the ocean floor
> 
> On 10 Oct 2012, at 21:40, Jan Lehnardt <jan@apache.org> wrote:
> 
>> flagged.
>> 
>> On Oct 10, 2012, at 22:34 , Robert Newson <robert.newson@gmail.com> wrote:
>> 
>>> Jan,
>>> 
>>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
>>> 
>>> I like the ini uuid idea best, modelled after the cookie with secret.
>>> If we have the uuid, we'd omit host name as well as port, right?
>>> 
>>> Sent from the ocean floor
>>> 
>>> On 10 Oct 2012, at 21:12, Jan Lehnardt <jan@apache.org> wrote:
>>> 
>>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>>>> 
>>>> Cheers
>>>> Jan
>>>> --
>>>> 
>>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <dustin@spy.net> wrote:
>>>> 
>>>>> 
>>>>> I'm bringing this back up as requested.  I'm currently simultaneously
in the "not replicating interesting things" and "has duplicate replicates state".  I think
the stuff below shows the "not replicating" stuff.
>>>>> 
>>>>> Active tasks shows the other (these are based on replicator DB documents
(example below):
>>>>> 
>>>>> [
>>>>> {
>>>>>    "checkpointed_source_seq": 2022317,
>>>>>    "continuous": true,
>>>>>    "doc_id": "cbstats-from-dogbowl",
>>>>>    "doc_write_failures": 0,
>>>>>    "docs_read": 300,
>>>>>    "docs_written": 300,
>>>>>    "missing_revisions_found": 300,
>>>>>    "pid": "<0.10466.12>",
>>>>>    "progress": 100,
>>>>>    "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>>>    "revisions_checked": 304,
>>>>>    "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>    "source_seq": 2022317,
>>>>>    "started_on": 1349309457,
>>>>>    "target": "cbstats",
>>>>>    "type": "replication",
>>>>>    "updated_on": 1349310442
>>>>> },
>>>>> {
>>>>>    "checkpointed_source_seq": 2022317,
>>>>>    "continuous": true,
>>>>>    "doc_id": "cbstats-from-dogbowl",
>>>>>    "doc_write_failures": 0,
>>>>>    "docs_read": 62,
>>>>>    "docs_written": 62,
>>>>>    "missing_revisions_found": 62,
>>>>>    "pid": "<0.11019.12>",
>>>>>    "progress": 100,
>>>>>    "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>>>    "revisions_checked": 304,
>>>>>    "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>>    "source_seq": 2022317,
>>>>>    "started_on": 1349309471,
>>>>>    "target": "cbstats",
>>>>>    "type": "replication",
>>>>>    "updated_on": 1349310443
>>>>> },
>>>>> {
>>>>>    "checkpointed_source_seq": 107068,
>>>>>    "continuous": true,
>>>>>    "doc_id": "gerrit-from-prod",
>>>>>    "doc_write_failures": 0,
>>>>>    "docs_read": 22,
>>>>>    "docs_written": 22,
>>>>>    "missing_revisions_found": 22,
>>>>>    "pid": "<0.11086.12>",
>>>>>    "progress": 100,
>>>>>    "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>>>    "revisions_checked": 26,
>>>>>    "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>    "source_seq": 107068,
>>>>>    "started_on": 1349309487,
>>>>>    "target": "gerrit",
>>>>>    "type": "replication",
>>>>>    "updated_on": 1349310445
>>>>> },
>>>>> {
>>>>>    "checkpointed_source_seq": 107068,
>>>>>    "continuous": true,
>>>>>    "doc_id": "gerrit-from-prod",
>>>>>    "doc_write_failures": 0,
>>>>>    "docs_read": 17,
>>>>>    "docs_written": 17,
>>>>>    "missing_revisions_found": 17,
>>>>>    "pid": "<0.11107.12>",
>>>>>    "progress": 100,
>>>>>    "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>>>    "revisions_checked": 26,
>>>>>    "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>>    "source_seq": 107068,
>>>>>    "started_on": 1349309488,
>>>>>    "target": "gerrit",
>>>>>    "type": "replication",
>>>>>    "updated_on": 1349310445
>>>>> }
>>>>> ]
>>>>> 
>>>>> 
>>>>> The replicator document for the latter, for example is this:
>>>>> 
>>>>> {
>>>>> "_id": "gerrit-from-prod",
>>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>>>> "target": "gerrit",
>>>>> "continuous": true,
>>>>> "user_ctx": {
>>>>>   "roles": [
>>>>>       "_admin"
>>>>>   ]
>>>>> },
>>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>>>> "_replication_state": "triggered"
>>>>> }
>>>>> 
>>>>> 
>>>>> Begin forwarded message:
>>>>> 
>>>>>> From: Dustin Sallings <dustin@spy.net>
>>>>>> Subject: Re: replication problems
>>>>>> Date: June 15, 2012 0:10:04 PDT
>>>>>> To: dev@couchdb.apache.org
>>>>>> Reply-To: dev@couchdb.apache.org
>>>>>> 
>>>>>> 
>>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>>>> 
>>>>>>> Ar you using _replicate or _replicator ? Anything interresting
in logs?
>>>>>> 
>>>>>> 
>>>>>> I'm using _replicator (wonderful feature, I just kill the DB and
everything goes back the way I want it).
>>>>>> 
>>>>>> Hmm...  I do think I found some stuff digging through the logs. 
This is the local DB I noticed not doing its thing, although there were tons of errors all
around this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>>>> 
>>>>>> 
>>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication
`ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>>>> 42.0>}]}},
>>>>>> {gen_server,call,
>>>>>>        [couch_server,
>>>>>>         {open,<<"rpics">>,
>>>>>>               [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>         infinity]}}
>>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic
server <0.383.0> terminating
>>>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>>>                   {{timeout,
>>>>>>                     {gen_server,call,
>>>>>>                      [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>>>                    {gen_server,call,
>>>>>>                     [couch_server,
>>>>>>                      {open,<<"cbstats">>,
>>>>>>                       [{user_ctx,
>>>>>>                         {user_ctx,null,[<<"_admin">>],undefined}},
>>>>>>                        {user_ctx,
>>>>>>                         {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>>                      infinity]}}}
>>>>>> 
>>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>>>                    {httpdb,
>>>>>>                     "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>>>                     nil,
>>>>>>                     [{"Accept","application/json"},
>>>>>>                      {"User-Agent","CouchDB/1.2.0"}],
>>>>>>                     30000,
>>>>>>                     [{socket_options,
>>>>>>                       [{keepalive,true},{nodelay,false}]}],
>>>>>>                     10,250,<0.273.0>,20},
>>>>>>                    {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>                     <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>                     {db_header,6,984356,0,
>>>>>>                      {860345646,{737369,975,640891414},59433736},
>>>>>>                      {860348005,738344,42056446},
>>>>>>                      {860352635,[],5737},
>>>>>>                      0,nil,nil,1000},
>>>>>>                     984356,
>>>>>>                     {btree,<0.286.0>,
>>>>>>                      {860345646,{737369,975,640891414},59433736},
>>>>>>                      #Fun<couch_db_updater.10.57960608>,
>>>>>>                      #Fun<couch_db_updater.11.57960608>,
>>>>>>                      #Fun<couch_btree.5.133731799>,
>>>>>>                      #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>                     {btree,<0.286.0>,
>>>>>>                      {860348005,738344,42056446},
>>>>>>                      #Fun<couch_db_updater.13.57960608>,
>>>>>>                      #Fun<couch_db_updater.14.57960608>,
>>>>>>                      #Fun<couch_btree.5.133731799>,
>>>>>>                      #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>                     {btree,<0.286.0>,
>>>>>>                      {860352635,[],5737},
>>>>>>                      #Fun<couch_btree.3.133731799>,
>>>>>>                      #Fun<couch_btree.4.133731799>,
>>>>>>                      #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>                     984356,<<"cbstats">>,
>>>>>>                     "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>                     nil,
>>>>>>                     {user_ctx,null,[<<"_admin">>],undefined},
>>>>>>                     nil,1000,
>>>>>>                     [before_header,after_header,on_file_open],
>>>>>>                     [{user_ctx,
>>>>>>                       {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>                     snappy,nil,nil},
>>>>>>                    [],nil,nil,nil,
>>>>>>                    {rep_stats,0,0,0,0,0},
>>>>>>                    nil,<0.385.0>,
>>>>>>                    {batch,[],0}}
>>>>>> ** Reason for termination ==
>>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Scrolling to the beginning of the errors, I find this:
>>>>>> 
>>>>>> 
>>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication
`543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`)
failed: source_db_down
>>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1
- - GET /_all_dbs 200
>>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic
server <0.289.0> terminating
>>>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>>>                      [{{doc,
>>>>>>                            <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>>>                            {0,[<<"346185">>]},
>>>>>>                            {[{<<"session_id">>,
>>>>>>                               <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>                              {<<"source_last_seq">>,1419004},
>>>>>>                              {<<"replication_id_version">>,2},
>>>>>>                              {<<"history">>,
>>>>>>                               [{[{<<"session_id">>,
>>>>>>                                   <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Thu, 14 Jun 2012 01:35:02
GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Thu, 14 Jun 2012 23:15:29
GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1410146},
>>>>>>                                  {<<"end_last_seq">>,1419004},
>>>>>>                                  {<<"recorded_seq">>,1419004},
>>>>>>                                  {<<"missing_checked">>,8100},
>>>>>>                                  {<<"missing_found">>,8100},
>>>>>>                                  {<<"docs_read">>,8100},
>>>>>>                                  {<<"docs_written">>,8100},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Tue, 12 Jun 2012 05:51:17
GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Tue, 12 Jun 2012 13:02:37
GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1407186},
>>>>>>                                  {<<"end_last_seq">>,1410146},
>>>>>>                                  {<<"recorded_seq">>,1410146},
>>>>>>                                  {<<"missing_checked">>,2583},
>>>>>>                                  {<<"missing_found">>,2577},
>>>>>>                                  {<<"docs_read">>,2577},
>>>>>>                                  {<<"docs_written">>,2577},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"172de62044281a01b1584a9d099f42af">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Mon, 11 Jun 2012 03:40:11
GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Mon, 11 Jun 2012 15:16:24
GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1405428},
>>>>>>                                  {<<"end_last_seq">>,1407186},
>>>>>>                                  {<<"recorded_seq">>,1407186},
>>>>>>                                  {<<"missing_checked">>,1721},
>>>>>>                                  {<<"missing_found">>,1721},
>>>>>>                                  {<<"docs_read">>,1721},
>>>>>>                                  {<<"docs_written">>,1721},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Sat, 09 Jun 2012 07:47:22
GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Sun, 10 Jun 2012 21:16:20
GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1386289},
>>>>>>                                  {<<"end_last_seq">>,1405428},
>>>>>>                                  {<<"recorded_seq">>,1405428},
>>>>>>                                  {<<"missing_checked">>,16977},
>>>>>>                                  {<<"missing_found">>,16977},
>>>>>>                                  {<<"docs_read">>,16977},
>>>>>>                                  {<<"docs_written">>,16977},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Mon, 04 Jun 2012 02:39:44
GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Mon, 04 Jun 2012 12:35:50
GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1384738},
>>>>>>                                  {<<"end_last_seq">>,1386289},
>>>>>>                                  {<<"recorded_seq">>,1386289},
>>>>>>                                  {<<"missing_checked">>,1551},
>>>>>>                                  {<<"missing_found">>,1550},
>>>>>>                                  {<<"docs_read">>,1550},
>>>>>>                                  {<<"docs_written">>,1550},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Wed, 30 May 2012 20:41:43
GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Mon, 04 Jun 2012 02:37:33
GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1372404},
>>>>>>                                  {<<"end_last_seq">>,1384738},
>>>>>>                                  {<<"recorded_seq">>,1384738},
>>>>>>                                  {<<"missing_checked">>,12334},
>>>>>>                                  {<<"missing_found">>,12333},
>>>>>>                                  {<<"docs_read">>,12333},
>>>>>>                                  {<<"docs_written">>,12333},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>>                                {[{<<"session_id">>,
>>>>>>                                   <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>>>                                  {<<"start_time">>,
>>>>>>                                   <<"Sun, 27 May 2012 23:36:41
GMT">>},
>>>>>>                                  {<<"end_time">>,
>>>>>>                                   <<"Wed, 30 May 2012 20:40:14
GMT">>},
>>>>>>                                  {<<"start_last_seq">>,1361049},
>>>>>>                                  {<<"end_last_seq">>,1372404},
>>>>>>                                  {<<"recorded_seq">>,1372404},
>>>>>>                                  {<<"missing_checked">>,11355},
>>>>>>                                  {<<"missing_found">>,11355},
>>>>>>                                  {<<"docs_read">>,11355},
>>>>>>                                  {<<"docs_written">>,11355},
>>>>>>                                  {<<"doc_write_failures">>,0}]},
>>>>>> [...lots of these...]
>>>>>> 
>>>>>>                            [],false,[]},
>>>>>>                        #Ref<0.0.15.159973>}],
>>>>>>                      false,false}
>>>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>>                       <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>>                       {db_header,6,992456,0,
>>>>>>                           {943280145,{744250,975,647546641},60017672},
>>>>>>                           {943282327,745225,42485979},
>>>>>>                           {943267963,[],5753},
>>>>>>                           0,nil,nil,1000},
>>>>>>                       992456,
>>>>>>                       {btree,<0.286.0>,
>>>>>>                           {943280145,{744250,975,647546641},60017672},
>>>>>>                           #Fun<couch_db_updater.10.57960608>,
>>>>>>                           #Fun<couch_db_updater.11.57960608>,
>>>>>>                           #Fun<couch_btree.5.133731799>,
>>>>>>                           #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>                       {btree,<0.286.0>,
>>>>>>                           {943282327,745225,42485979},
>>>>>>                           #Fun<couch_db_updater.13.57960608>,
>>>>>>                           #Fun<couch_db_updater.14.57960608>,
>>>>>>                           #Fun<couch_btree.5.133731799>,
>>>>>>                           #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>                       {btree,<0.286.0>,
>>>>>>                           {943267963,[],5753},
>>>>>>                           #Fun<couch_btree.3.133731799>,
>>>>>>                           #Fun<couch_btree.4.133731799>,
>>>>>>                           #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>                       992456,<<"cbstats">>,
>>>>>>                       "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>>                       nil,
>>>>>>                       {user_ctx,null,[],undefined},
>>>>>>                       nil,1000,
>>>>>>                       [before_header,after_header,on_file_open],
>>>>>>                       [{user_ctx,
>>>>>>                            {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>                       snappy,nil,nil}
>>>>>> ** Reason for termination ==
>>>>>> ** {timeout,
>>>>>>  {gen_server,call,
>>>>>>      [<0.288.0>,
>>>>>>       {db_updated,
>>>>>>           {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>>>               <0.286.0>,<0.367.0>,
>>>>>>               {db_header,6,992456,0,
>>>>>>                   {943280145,{744250,975,647546641},60017672},
>>>>>>                   {943282327,745225,42485979},
>>>>>>                   {943267963,[],5753},
>>>>>>                   0,nil,nil,1000},
>>>>>>               992456,
>>>>>>               {btree,<0.286.0>,
>>>>>>                   {943280145,{744250,975,647546641},60017672},
>>>>>>                   #Fun<couch_db_updater.10.57960608>,
>>>>>>                   #Fun<couch_db_updater.11.57960608>,
>>>>>>                   #Fun<couch_btree.5.133731799>,
>>>>>>                   #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>>               {btree,<0.286.0>,
>>>>>>                   {943282327,745225,42485979},
>>>>>>                   #Fun<couch_db_updater.13.57960608>,
>>>>>>                   #Fun<couch_db_updater.14.57960608>,
>>>>>>                   #Fun<couch_btree.5.133731799>,
>>>>>>                   #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>>               {btree,<0.286.0>,
>>>>>>                   {943284347,[],5756},
>>>>>>                   #Fun<couch_btree.3.133731799>,
>>>>>>                   #Fun<couch_btree.4.133731799>,
>>>>>>                   #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>>               992456,<<"cbstats">>,
>>>>>>               "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>>>               {user_ctx,null,[],undefined},
>>>>>>               #Ref<0.0.15.160107>,1000,
>>>>>>               [before_header,after_header,on_file_open],
>>>>>>               [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>>               snappy,nil,nil}}]}}
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> dustin sallings
>>>>> 
>>>>> --
>>>>> dustin sallings
>> 


Mime
View raw message