incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fredrik Widlund (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-722) Continuous replication tasks fail
Date Thu, 01 Apr 2010 18:35:27 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852494#action_12852494
] 

Fredrik Widlund commented on COUCHDB-722:
-----------------------------------------



Hi,

Probably a more informative log:

[info] [<0.26977.2>] 1.2.3.4 - - 'POST' /service-metrics/_compact 202
[info] [<0.146.0>] Starting compaction for db "service-metrics"
[info] [<0.26627.2>] 127.0.0.1 - - 'GET' /service-metrics/Mon0.n6-www0.n101?open_revs=["56844-2393e6afa315d62d6f98996a5402f0f7"]&\
revs=true&latest=true 200
[info] [<0.26704.2>] 1.2.3.5 - - 'POST' /node-metrics/_missing_revs 200
[info] [<0.26755.2>] 1.2.3.5 - - 'POST' /service-metrics/_missing_revs 200
[info] [<0.26977.2>] 1.2.3.4 - - 'PUT' /service-metrics/Mon1.n7-www0.n102 201
[info] [<0.26627.2>] 127.0.0.1 - - 'GET' /service-metrics/Mon1.n7-www0.n102?open_revs=["23834-9d230c7449a9321e51e9d5983ef00d47"]&\
revs=true&latest=true 200
[info] [<0.26704.2>] 1.2.3.5 - - 'POST' /service-metrics/_missing_revs 200
[info] [<0.26977.2>] 1.2.3.4 - - 'PUT' /node-metrics/Mon1.n7-n302 201
[info] [<0.26988.2>] 1.2.3.4 - - 'PUT' /service-metrics/Mon1.n7-www0.n301 201
[info] [<0.26627.2>] 127.0.0.1 - - 'GET' /node-metrics/Mon1.n7-n302?open_revs=["21751-f857ed9d519bff3f054abcd990a8182c"]&revs=tru\
e&latest=true 200
[info] [<0.26682.2>] 127.0.0.1 - - 'GET' /service-metrics/Mon1.n7-www0.n301?open_revs=["26450-c16e040281883c61c62ef7d2c4f2a7ef"]&\
revs=true&latest=true 200
[info] [<0.26755.2>] 1.2.3.5 - - 'POST' /service-metrics/_missing_revs 200
[info] [<0.26704.2>] 1.2.3.5 - - 'POST' /service-metrics/_bulk_docs 201
[info] [<0.26988.2>] 1.2.3.4 - - 'PUT' /service-metrics/Mon1.n7-fl1.ds18 201
[info] [<0.26755.2>] 1.2.3.5 - - 'POST' /service-metrics/_missing_revs 200
[info] [<0.26704.2>] 1.2.3.5 - - 'POST' /node-metrics/_missing_revs 200
[info] [<0.26627.2>] 127.0.0.1 - - 'GET' /service-metrics/Mon1.n7-fl1.ds18?open_revs=["21147-d965415af5ac43e96f94b9df5cdf7b2f"]&r\
evs=true&latest=true 200
[info] [<0.146.0>] Compaction file still behind main file (update seq=359295. compact
update seq=359291). Retrying.
[info] [<0.26704.2>] 1.2.3.5 - - 'POST' /service-metrics/_missing_revs 200
[info] [<0.26988.2>] 1.2.3.4 - - 'PUT' /service-metrics/Mon1.n7-www0.n101 201
[info] [<0.146.0>] Compaction file still behind main file (update seq=359296. compact
update seq=359295). Retrying.
[info] [<0.26627.2>] 127.0.0.1 - - 'GET' /service-metrics/Mon1.n7-www0.n101?open_revs=["21916-c83856ef70eb8dfcd8f3449406fb4a02"]&\
revs=true&latest=true 200
[info] [<0.146.0>] Compaction for db "service-metrics" completed.
[info] [<0.26704.2>] 1.2.3.5 - - 'POST' /service-metrics/_missing_revs 200
[info] [<0.26627.2>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=266207
201
[info] [<0.13563.1>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replicati\
on checkpoint
[error] [<0.13563.1>] ** Generic server <0.13563.1> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.26586.2>,<0.26590.2>,<0.26593.2>,
                            <0.26595.2>,
                            {http_db,"http://127.0.0.1:5984/node-metrics/",
                                [],[],
                                [{"User-Agent","CouchDB/0.11.0"},
                                 {"Accept","application/json"},
                                 {"Accept-Encoding","gzip"}],
                                [],get,nil,
                                [{response_format,binary},
                                 {inactivity_timeout,30000}],
                                10,500,nil},
                            {http_db,
[...]

Fredrik Widlund, CSO / Chief Architect, Qbrick
Direct: +46 8 459 90 32 | Mobile: +46 76 899 96 66

Södra Hamnvägen 22 | 115 41 STOCKHOLM
Web and mobile: www.qbrick.com

-----Ursprungligt meddelande-----
Från: Randall Leeds (JIRA) [mailto:jira@apache.org]
Skickat: den 1 april 2010 18:28
Till: Fredrik Widlund
Ämne: [jira] Commented: (COUCHDB-722) Continuous replication tasks fail


    [ https://issues.apache.org/jira/browse/COUCHDB-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852437#action_12852437
]

Randall Leeds commented on COUCHDB-722:
---------------------------------------

It would be helpful to know if this happens only when compaction completes.

The replicator has retry logic for transient failures, but that does not include a 404 response
from the source. IMO that's a bug in the compaction code.

I'll take a closer look, though.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




> Continuous replication tasks fail
> ---------------------------------
>
>                 Key: COUCHDB-722
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-722
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: Arch Linux, CouchDB 0.11
>            Reporter: Fredrik Widlund
>
> Couchdb 0.11.0 replication tasks fails with the below after working for everything from
a few minutes to an hour. The below replication is of the type {"source":"http://127.0.0.1:5984/node-metrics",
"target":"http://1.2.3.4:5984/node-metrics", "continuous":true} and the node-metrics database
exist on both machines.
> The database is periodically compacted which, and I'm speculating here, could be a contributing
factor to the crash.
> Kind regards,
> Fredrik Widlund
> =CRASH REPORT==== 1-Apr-2010::14:25:26 ===
>   crasher:
>     initial call: couch_rep:init/1
>     pid: <0.274.0>
>     registered_name: []
>     exception exit: {{badmatch,
>                          {stop,
>                              {db_not_found,
>                                  <<"http://127.0.0.1:5984/node-metrics/">>}}},
>                      [{couch_rep,do_checkpoint,1},
>                       {couch_rep,handle_cast,2},
>                       {gen_server,handle_msg,5},
>                       {proc_lib,init_p_do_apply,3}]}
>       in function  gen_server:terminate/6
>     ancestors: [couch_rep_sup,couch_primary_services,couch_server_sup,
>                   <0.32.0>]
>     messages: [{'EXIT',<0.21084.1>,normal}]
>     links: [<0.81.0>]
>     dictionary: [{task_status_update,{{1270,124726,124009},0}}]
>     trap_exit: true
>     status: running
>     heap_size: 10946
>     stack_size: 24
>     reductions: 29173458
>   neighbours:
> [error] [<0.81.0>] {error_report,<0.31.0>,
>     {<0.81.0>,supervisor_report,
>      [{supervisor,{local,couch_rep_sup}},
>       {errorContext,child_terminated},
>       {reason,
>           {{badmatch,
>                {stop,
>                    {db_not_found,<<"http://127.0.0.1:5984/node-metrics/">>}}},
>            [{couch_rep,do_checkpoint,1},
>             {couch_rep,handle_cast,2},
>             {gen_server,handle_msg,5},
>             {proc_lib,init_p_do_apply,3}]}},
>       {offender,
>           [{pid,<0.274.0>},
>            {name,"f3e3081db5a215dbaf9b2984f0552090+continuous"},
>            {mfa,
>                {gen_server,start_link,
>                    [couch_rep,
>                     ["f3e3081db5a215dbaf9b2984f0552090",
>                      {[{<<"target">>,
>                         <<"http://1.2.3.4:5984/node-metrics">>},
>                        {<<"source">>,<<"http://127.0.0.1:5984/node-metrics">>},
>                        {<<"continuous">>,true}]},
>                      {user_ctx,null,
>                          [<<"_admin">>],
>                          <<"{couch_httpd_auth, default_authentication_handler}">>}],
>                     []]}},
>            {restart_type,temporary},
>            {shutdown,1},
>            {child_type,worker}]}]}}
> =SUPERVISOR REPORT==== 1-Apr-2010::14:25:26 ===
>      Supervisor: {local,couch_rep_sup}
>      Context:    child_terminated
>      Reason:     {{badmatch,
>                       {stop,
>                           {db_not_found,
>                               <<"http://127.0.0.1:5984/node-metrics/">>}}},
>                   [{couch_rep,do_checkpoint,1},
>                    {couch_rep,handle_cast,2},
>                    {gen_server,handle_msg,5},
>                    {proc_lib,init_p_do_apply,3}]}
>      Offender:   [{pid,<0.274.0>},
>                   {name,"f3e3081db5a215dbaf9b2984f0552090+continuous"},
>                   {mfa,
>                       {gen_server,start_link,
>                           [couch_rep,
>                            ["f3e3081db5a215dbaf9b2984f0552090",
>                             {[{<<"target">>,
>                                <<"http://1.2.3.4:5984/node-metrics">>},
>                               {<<"source">>,
>                                <<"http://127.0.0.1:5984/node-metrics">>},
>                               {<<"continuous">>,true}]},
>                             {user_ctx,null,
>                                 [<<"_admin">>],
>                                 <<"{couch_httpd_auth, default_authentication_handler}">>}],
>                            []]}},
>                   {restart_type,temporary},
>                   {shutdown,1},
>                   {child_type,worker}]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message