couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: replication problems
Date Thu, 11 Oct 2012 09:42:18 GMT
I'll note here that the attached patch is wrong. It uses a single uuid
from the node running replication, which might not be the source or
target. Instead, the uuid of source and target must be retrieved and
used instead of the host:port. Jason's suggestion to add the uuid
(stored in the ini file) to the welcome message sounds really good to
me.

Can't attach this to the ticket today as I don't have my Jira creds.

Sent from the ocean floor

On 10 Oct 2012, at 21:40, Jan Lehnardt <jan@apache.org> wrote:

> flagged.
>
> On Oct 10, 2012, at 22:34 , Robert Newson <robert.newson@gmail.com> wrote:
>
>> Jan,
>>
>> Flag that as fix-for 1.3? I don't have my creds on my phone to do it.
>>
>> I like the ini uuid idea best, modelled after the cookie with secret.
>> If we have the uuid, we'd omit host name as well as port, right?
>>
>> Sent from the ocean floor
>>
>> On 10 Oct 2012, at 21:12, Jan Lehnardt <jan@apache.org> wrote:
>>
>>> Filipe tells me this is https://issues.apache.org/jira/browse/COUCHDB-1259
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>> On Oct 4, 2012, at 02:28 , Dustin Sallings <dustin@spy.net> wrote:
>>>
>>>>
>>>>  I'm bringing this back up as requested.  I'm currently simultaneously in
the "not replicating interesting things" and "has duplicate replicates state".  I think the
stuff below shows the "not replicating" stuff.
>>>>
>>>>  Active tasks shows the other (these are based on replicator DB documents
(example below):
>>>>
>>>> [
>>>> {
>>>>     "checkpointed_source_seq": 2022317,
>>>>     "continuous": true,
>>>>     "doc_id": "cbstats-from-dogbowl",
>>>>     "doc_write_failures": 0,
>>>>     "docs_read": 300,
>>>>     "docs_written": 300,
>>>>     "missing_revisions_found": 300,
>>>>     "pid": "<0.10466.12>",
>>>>     "progress": 100,
>>>>     "replication_id": "50daecd0a29f4b7e5d102990831f3d64+continuous",
>>>>     "revisions_checked": 304,
>>>>     "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>     "source_seq": 2022317,
>>>>     "started_on": 1349309457,
>>>>     "target": "cbstats",
>>>>     "type": "replication",
>>>>     "updated_on": 1349310442
>>>> },
>>>> {
>>>>     "checkpointed_source_seq": 2022317,
>>>>     "continuous": true,
>>>>     "doc_id": "cbstats-from-dogbowl",
>>>>     "doc_write_failures": 0,
>>>>     "docs_read": 62,
>>>>     "docs_written": 62,
>>>>     "missing_revisions_found": 62,
>>>>     "pid": "<0.11019.12>",
>>>>     "progress": 100,
>>>>     "replication_id": "411e341d5aa9a3fe636cf4ea8ba71720+continuous",
>>>>     "revisions_checked": 304,
>>>>     "source": "http://dustin:*****@single.couchbase.net/cbstats/",
>>>>     "source_seq": 2022317,
>>>>     "started_on": 1349309471,
>>>>     "target": "cbstats",
>>>>     "type": "replication",
>>>>     "updated_on": 1349310443
>>>> },
>>>> {
>>>>     "checkpointed_source_seq": 107068,
>>>>     "continuous": true,
>>>>     "doc_id": "gerrit-from-prod",
>>>>     "doc_write_failures": 0,
>>>>     "docs_read": 22,
>>>>     "docs_written": 22,
>>>>     "missing_revisions_found": 22,
>>>>     "pid": "<0.11086.12>",
>>>>     "progress": 100,
>>>>     "replication_id": "4a21031dac0d81637a23c32bad620be9+continuous",
>>>>     "revisions_checked": 26,
>>>>     "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>     "source_seq": 107068,
>>>>     "started_on": 1349309487,
>>>>     "target": "gerrit",
>>>>     "type": "replication",
>>>>     "updated_on": 1349310445
>>>> },
>>>> {
>>>>     "checkpointed_source_seq": 107068,
>>>>     "continuous": true,
>>>>     "doc_id": "gerrit-from-prod",
>>>>     "doc_write_failures": 0,
>>>>     "docs_read": 17,
>>>>     "docs_written": 17,
>>>>     "missing_revisions_found": 17,
>>>>     "pid": "<0.11107.12>",
>>>>     "progress": 100,
>>>>     "replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9+continuous",
>>>>     "revisions_checked": 26,
>>>>     "source": "http://dustinphoto.iriscouch.com/gerrit/",
>>>>     "source_seq": 107068,
>>>>     "started_on": 1349309488,
>>>>     "target": "gerrit",
>>>>     "type": "replication",
>>>>     "updated_on": 1349310445
>>>> }
>>>> ]
>>>>
>>>>
>>>>  The replicator document for the latter, for example is this:
>>>>
>>>> {
>>>> "_id": "gerrit-from-prod",
>>>> "_rev": "2235-36de10fb757581a1782dacbb26ee4809",
>>>> "source": "http://dustinphoto.iriscouch.com/gerrit",
>>>> "target": "gerrit",
>>>> "continuous": true,
>>>> "user_ctx": {
>>>>    "roles": [
>>>>        "_admin"
>>>>    ]
>>>> },
>>>> "_replication_state_time": "2012-10-03T17:11:27-07:00",
>>>> "_replication_id": "b4ad5d3f2e5b78670e4c8364b18000e9",
>>>> "_replication_state": "triggered"
>>>> }
>>>>
>>>>
>>>> Begin forwarded message:
>>>>
>>>>> From: Dustin Sallings <dustin@spy.net>
>>>>> Subject: Re: replication problems
>>>>> Date: June 15, 2012 0:10:04 PDT
>>>>> To: dev@couchdb.apache.org
>>>>> Reply-To: dev@couchdb.apache.org
>>>>>
>>>>>
>>>>> On Jun 14, 2012, at 11:28 PM, Benoit Chesneau wrote:
>>>>>
>>>>>> Ar you using _replicate or _replicator ? Anything interresting in
logs?
>>>>>
>>>>>
>>>>>  I'm using _replicator (wonderful feature, I just kill the DB and everything
goes back the way I want it).
>>>>>
>>>>>  Hmm...  I do think I found some stuff digging through the logs.  This
is the local DB I noticed not doing its thing, although there were tons of errors all around
this.  Looks like the server got into some kind of bad state and sort of half-crashed.
>>>>>
>>>>>
>>>>> [Thu, 14 Jun 2012 23:20:12 GMT] [error] [<0.133.0>] Replication
`ae601df0373da82d1b4a9ff741c8ba18+continuous` (`rpics` -> `rpics-processed`) failed: {{timeout,{gen_server,call,[<0.213.0>,{open_ref_count,<0.4
>>>>> 42.0>}]}},
>>>>> {gen_server,call,
>>>>>         [couch_server,
>>>>>          {open,<<"rpics">>,
>>>>>                [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>          infinity]}}
>>>>> [Thu, 14 Jun 2012 23:20:25 GMT] [error] [<0.383.0>] ** Generic
server <0.383.0> terminating
>>>>> ** Last message in was {'EXIT',<0.384.0>,
>>>>>                    {{timeout,
>>>>>                      {gen_server,call,
>>>>>                       [<0.213.0>,{open_ref_count,<0.442.0>}]}},
>>>>>                     {gen_server,call,
>>>>>                      [couch_server,
>>>>>                       {open,<<"cbstats">>,
>>>>>                        [{user_ctx,
>>>>>                          {user_ctx,null,[<<"_admin">>],undefined}},
>>>>>                         {user_ctx,
>>>>>                          {user_ctx,null,[<<"_admin">>],undefined}}]},
>>>>>                       infinity]}}}
>>>>>
>>>>> ** When Server state == {state,<0.272.0>,<0.384.0>,20,
>>>>>                     {httpdb,
>>>>>                      "http://dustin:LOGGED_PASSWORD@single.couchbase.net/cbstats/",
>>>>>                      nil,
>>>>>                      [{"Accept","application/json"},
>>>>>                       {"User-Agent","CouchDB/1.2.0"}],
>>>>>                      30000,
>>>>>                      [{socket_options,
>>>>>                        [{keepalive,true},{nodelay,false}]}],
>>>>>                      10,250,<0.273.0>,20},
>>>>>                     {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>                      <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>                      {db_header,6,984356,0,
>>>>>                       {860345646,{737369,975,640891414},59433736},
>>>>>                       {860348005,738344,42056446},
>>>>>                       {860352635,[],5737},
>>>>>                       0,nil,nil,1000},
>>>>>                      984356,
>>>>>                      {btree,<0.286.0>,
>>>>>                       {860345646,{737369,975,640891414},59433736},
>>>>>                       #Fun<couch_db_updater.10.57960608>,
>>>>>                       #Fun<couch_db_updater.11.57960608>,
>>>>>                       #Fun<couch_btree.5.133731799>,
>>>>>                       #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>                      {btree,<0.286.0>,
>>>>>                       {860348005,738344,42056446},
>>>>>                       #Fun<couch_db_updater.13.57960608>,
>>>>>                       #Fun<couch_db_updater.14.57960608>,
>>>>>                       #Fun<couch_btree.5.133731799>,
>>>>>                       #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>                      {btree,<0.286.0>,
>>>>>                       {860352635,[],5737},
>>>>>                       #Fun<couch_btree.3.133731799>,
>>>>>                       #Fun<couch_btree.4.133731799>,
>>>>>                       #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>                      984356,<<"cbstats">>,
>>>>>                      "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>                      nil,
>>>>>                      {user_ctx,null,[<<"_admin">>],undefined},
>>>>>                      nil,1000,
>>>>>                      [before_header,after_header,on_file_open],
>>>>>                      [{user_ctx,
>>>>>                        {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>                      snappy,nil,nil},
>>>>>                     [],nil,nil,nil,
>>>>>                     {rep_stats,0,0,0,0,0},
>>>>>                     nil,<0.385.0>,
>>>>>                     {batch,[],0}}
>>>>> ** Reason for termination ==
>>>>> ** {noproc,{gen_server,call,[<0.367.0>,{drop,<0.383.0>},infinity]}}
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  Scrolling to the beginning of the errors, I find this:
>>>>>
>>>>>
>>>>> [Thu, 14 Jun 2012 23:15:54 GMT] [error] [<0.164.0>] Replication
`543f76281e8d52d6ce5b51fddf0588e7+continuous` (`photo` -> `http://dustin:*****@dustinphoto.couchone.com/photo/`)
failed: source_db_down
>>>>> [Thu, 14 Jun 2012 23:18:57 GMT] [info] [<0.358.0>] 127.0.0.1 -
- GET /_all_dbs 200
>>>>> [Thu, 14 Jun 2012 23:19:52 GMT] [error] [<0.289.0>] ** Generic
server <0.289.0> terminating
>>>>> ** Last message in was {update_docs,<0.272.0>,[],
>>>>>                       [{{doc,
>>>>>                             <<"_local/c4cc070f896d7267e52ba012856fed4b">>,
>>>>>                             {0,[<<"346185">>]},
>>>>>                             {[{<<"session_id">>,
>>>>>                                <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>                               {<<"source_last_seq">>,1419004},
>>>>>                               {<<"replication_id_version">>,2},
>>>>>                               {<<"history">>,
>>>>>                                [{[{<<"session_id">>,
>>>>>                                    <<"9fb3475683d44bb1e151031dd42cc59f">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Thu, 14 Jun 2012 01:35:02
GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Thu, 14 Jun 2012 23:15:29
GMT">>},
>>>>>                                   {<<"start_last_seq">>,1410146},
>>>>>                                   {<<"end_last_seq">>,1419004},
>>>>>                                   {<<"recorded_seq">>,1419004},
>>>>>                                   {<<"missing_checked">>,8100},
>>>>>                                   {<<"missing_found">>,8100},
>>>>>                                   {<<"docs_read">>,8100},
>>>>>                                   {<<"docs_written">>,8100},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"3edd7c50327eab7ec0768451e34efa8b">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Tue, 12 Jun 2012 05:51:17
GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Tue, 12 Jun 2012 13:02:37
GMT">>},
>>>>>                                   {<<"start_last_seq">>,1407186},
>>>>>                                   {<<"end_last_seq">>,1410146},
>>>>>                                   {<<"recorded_seq">>,1410146},
>>>>>                                   {<<"missing_checked">>,2583},
>>>>>                                   {<<"missing_found">>,2577},
>>>>>                                   {<<"docs_read">>,2577},
>>>>>                                   {<<"docs_written">>,2577},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"172de62044281a01b1584a9d099f42af">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Mon, 11 Jun 2012 03:40:11
GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Mon, 11 Jun 2012 15:16:24
GMT">>},
>>>>>                                   {<<"start_last_seq">>,1405428},
>>>>>                                   {<<"end_last_seq">>,1407186},
>>>>>                                   {<<"recorded_seq">>,1407186},
>>>>>                                   {<<"missing_checked">>,1721},
>>>>>                                   {<<"missing_found">>,1721},
>>>>>                                   {<<"docs_read">>,1721},
>>>>>                                   {<<"docs_written">>,1721},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"e60a126a2036c5fab00a1249101820c8">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Sat, 09 Jun 2012 07:47:22
GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Sun, 10 Jun 2012 21:16:20
GMT">>},
>>>>>                                   {<<"start_last_seq">>,1386289},
>>>>>                                   {<<"end_last_seq">>,1405428},
>>>>>                                   {<<"recorded_seq">>,1405428},
>>>>>                                   {<<"missing_checked">>,16977},
>>>>>                                   {<<"missing_found">>,16977},
>>>>>                                   {<<"docs_read">>,16977},
>>>>>                                   {<<"docs_written">>,16977},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"ef3e4333d340dcf73ddfa3fe8c720042">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Mon, 04 Jun 2012 02:39:44
GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Mon, 04 Jun 2012 12:35:50
GMT">>},
>>>>>                                   {<<"start_last_seq">>,1384738},
>>>>>                                   {<<"end_last_seq">>,1386289},
>>>>>                                   {<<"recorded_seq">>,1386289},
>>>>>                                   {<<"missing_checked">>,1551},
>>>>>                                   {<<"missing_found">>,1550},
>>>>>                                   {<<"docs_read">>,1550},
>>>>>                                   {<<"docs_written">>,1550},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"d5123a3caf462794aaf5a47be1bb3b6e">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Wed, 30 May 2012 20:41:43
GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Mon, 04 Jun 2012 02:37:33
GMT">>},
>>>>>                                   {<<"start_last_seq">>,1372404},
>>>>>                                   {<<"end_last_seq">>,1384738},
>>>>>                                   {<<"recorded_seq">>,1384738},
>>>>>                                   {<<"missing_checked">>,12334},
>>>>>                                   {<<"missing_found">>,12333},
>>>>>                                   {<<"docs_read">>,12333},
>>>>>                                   {<<"docs_written">>,12333},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>>                                 {[{<<"session_id">>,
>>>>>                                    <<"52a16e8832f70dc094f6fff5e9b7d75b">>},
>>>>>                                   {<<"start_time">>,
>>>>>                                    <<"Sun, 27 May 2012 23:36:41
GMT">>},
>>>>>                                   {<<"end_time">>,
>>>>>                                    <<"Wed, 30 May 2012 20:40:14
GMT">>},
>>>>>                                   {<<"start_last_seq">>,1361049},
>>>>>                                   {<<"end_last_seq">>,1372404},
>>>>>                                   {<<"recorded_seq">>,1372404},
>>>>>                                   {<<"missing_checked">>,11355},
>>>>>                                   {<<"missing_found">>,11355},
>>>>>                                   {<<"docs_read">>,11355},
>>>>>                                   {<<"docs_written">>,11355},
>>>>>                                   {<<"doc_write_failures">>,0}]},
>>>>> [...lots of these...]
>>>>>
>>>>>                             [],false,[]},
>>>>>                         #Ref<0.0.15.159973>}],
>>>>>                       false,false}
>>>>> ** When Server state == {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,
>>>>>                        <0.290.0>,<0.286.0>,<0.367.0>,
>>>>>                        {db_header,6,992456,0,
>>>>>                            {943280145,{744250,975,647546641},60017672},
>>>>>                            {943282327,745225,42485979},
>>>>>                            {943267963,[],5753},
>>>>>                            0,nil,nil,1000},
>>>>>                        992456,
>>>>>                        {btree,<0.286.0>,
>>>>>                            {943280145,{744250,975,647546641},60017672},
>>>>>                            #Fun<couch_db_updater.10.57960608>,
>>>>>                            #Fun<couch_db_updater.11.57960608>,
>>>>>                            #Fun<couch_btree.5.133731799>,
>>>>>                            #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>                        {btree,<0.286.0>,
>>>>>                            {943282327,745225,42485979},
>>>>>                            #Fun<couch_db_updater.13.57960608>,
>>>>>                            #Fun<couch_db_updater.14.57960608>,
>>>>>                            #Fun<couch_btree.5.133731799>,
>>>>>                            #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>                        {btree,<0.286.0>,
>>>>>                            {943267963,[],5753},
>>>>>                            #Fun<couch_btree.3.133731799>,
>>>>>                            #Fun<couch_btree.4.133731799>,
>>>>>                            #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>                        992456,<<"cbstats">>,
>>>>>                        "/Volumes/terror/db/couchdb/cbstats.couch",[],[],
>>>>>                        nil,
>>>>>                        {user_ctx,null,[],undefined},
>>>>>                        nil,1000,
>>>>>                        [before_header,after_header,on_file_open],
>>>>>                        [{user_ctx,
>>>>>                             {user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>                        snappy,nil,nil}
>>>>> ** Reason for termination ==
>>>>> ** {timeout,
>>>>>   {gen_server,call,
>>>>>       [<0.288.0>,
>>>>>        {db_updated,
>>>>>            {db,<0.288.0>,<0.289.0>,nil,<<"1339637701848579">>,<0.290.0>,
>>>>>                <0.286.0>,<0.367.0>,
>>>>>                {db_header,6,992456,0,
>>>>>                    {943280145,{744250,975,647546641},60017672},
>>>>>                    {943282327,745225,42485979},
>>>>>                    {943267963,[],5753},
>>>>>                    0,nil,nil,1000},
>>>>>                992456,
>>>>>                {btree,<0.286.0>,
>>>>>                    {943280145,{744250,975,647546641},60017672},
>>>>>                    #Fun<couch_db_updater.10.57960608>,
>>>>>                    #Fun<couch_db_updater.11.57960608>,
>>>>>                    #Fun<couch_btree.5.133731799>,
>>>>>                    #Fun<couch_db_updater.12.57960608>,snappy},
>>>>>                {btree,<0.286.0>,
>>>>>                    {943282327,745225,42485979},
>>>>>                    #Fun<couch_db_updater.13.57960608>,
>>>>>                    #Fun<couch_db_updater.14.57960608>,
>>>>>                    #Fun<couch_btree.5.133731799>,
>>>>>                    #Fun<couch_db_updater.15.57960608>,snappy},
>>>>>                {btree,<0.286.0>,
>>>>>                    {943284347,[],5756},
>>>>>                    #Fun<couch_btree.3.133731799>,
>>>>>                    #Fun<couch_btree.4.133731799>,
>>>>>                    #Fun<couch_btree.5.133731799>,nil,snappy},
>>>>>                992456,<<"cbstats">>,
>>>>>                "/Volumes/terror/db/couchdb/cbstats.couch",[],[],nil,
>>>>>                {user_ctx,null,[],undefined},
>>>>>                #Ref<0.0.15.160107>,1000,
>>>>>                [before_header,after_header,on_file_open],
>>>>>                [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],
>>>>>                snappy,nil,nil}}]}}
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> dustin sallings
>>>>
>>>> --
>>>> dustin sallings
>

Mime
View raw message