couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fredrik Widlund (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-722) Continuous replication tasks fail
Date Thu, 01 Apr 2010 19:55:27 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852524#action_12852524
] 

Fredrik Widlund commented on COUCHDB-722:
-----------------------------------------



A grep collection of crashes, if it's helpful.

[root@db3 scripts]# grep -B2 -A2 -E "\[error\] .*terminating" couchdb.stdout
[info] [<0.6318.0>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=69308
201
[info] [<0.291.0>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replication\
 checkpoint
[error] [<0.291.0>] ** Generic server <0.291.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.6763.0>,<0.6767.0>,<0.6770.0>,<0.6772.0>,
--
[info] [<0.31361.1>] 127.0.0.1 - - 'POST' /service-metrics/_ensure_full_commit?seq=98608
201
[info] [<0.273.0>] rebooting http://127.0.0.1:5984/service-metrics/ -> http://1.2.3.5:5984/service-metrics/
from last known repli\
cation checkpoint
[error] [<0.273.0>] ** Generic server <0.273.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.31620.1>,<0.31625.1>,<0.31627.1>,
--
[info] [<0.24154.5>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=230868
201
[info] [<0.15120.5>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replicati\
on checkpoint
[error] [<0.15120.5>] ** Generic server <0.15120.5> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.24125.5>,<0.24129.5>,<0.24132.5>,
--
[info] [<0.4380.7>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=248027
201
[info] [<0.3606.7>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replicatio\
n checkpoint
[error] [<0.3606.7>] ** Generic server <0.3606.7> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.4317.7>,<0.4322.7>,<0.4324.7>,<0.4326.7>,
--
[info] [<0.15414.7>] 127.0.0.1 - - 'POST' /service-metrics/_ensure_full_commit?seq=231731
201
[info] [<0.15142.5>] rebooting http://127.0.0.1:5984/service-metrics/ -> http://1.2.3.5:5984/service-metrics/
from last known rep\
lication checkpoint
[error] [<0.15142.5>] ** Generic server <0.15142.5> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.15516.7>,<0.15521.7>,<0.15523.7>,
--
[info] [<0.26905.7>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=255490
201
[info] [<0.16250.7>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replicati\
on checkpoint
[error] [<0.16250.7>] ** Generic server <0.16250.7> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.27125.7>,<0.27129.7>,<0.27132.7>,
--
[info] [<0.8487.8>] 127.0.0.1 - - 'POST' /service-metrics/_ensure_full_commit?seq=240461
201
[info] [<0.16228.7>] rebooting http://127.0.0.1:5984/service-metrics/ -> http://1.2.3.5:5984/service-metrics/
from last known rep\
lication checkpoint
[error] [<0.16228.7>] ** Generic server <0.16228.7> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.8531.8>,<0.8536.8>,<0.8538.8>,<0.8540.8>,
--
[info] [<0.15483.8>] 127.0.0.1 - - 'POST' /service-metrics/_ensure_full_commit?seq=247246
201
[info] [<0.15504.8>] rebooting http://127.0.0.1:5984/service-metrics/ -> http://1.2.3.5:5984/service-metrics/
from last known rep\
lication checkpoint
[error] [<0.15504.8>] ** Generic server <0.15504.8> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.15557.8>,<0.15563.8>,<0.15567.8>,
--
[info] [<0.15481.8>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replicati\
On checkpoint
[info] [<0.16982.8>] 1.2.3.5 - - 'POST' /node-metrics/_ensure_full_commit 201
[error] [<0.15481.8>] ** Generic server <0.15481.8> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.16926.8>,<0.16930.8>,<0.16933.8>,
--
[info] [<0.20255.8>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=269770
201
[info] [<0.18127.8>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replicati\
on checkpoint
[error] [<0.18127.8>] ** Generic server <0.18127.8> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.20451.8>,<0.20455.8>,<0.20458.8>,
--
[info] [<0.30782.8>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=272628
201
[info] [<0.22327.8>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replicati\
on checkpoint
[error] [<0.22327.8>] ** Generic server <0.22327.8> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.30991.8>,<0.30995.8>,<0.30998.8>,
--
[info] [<0.20666.1>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=288432
201
[info] [<0.274.0>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replication\
 checkpoint
[error] [<0.274.0>] ** Generic server <0.274.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.20538.1>,<0.20542.1>,<0.20545.1>,
--
[info] [<0.28001.2>] 127.0.0.1 - - 'POST' /node-metrics/_ensure_full_commit?seq=295892
201
[info] [<0.21122.1>] rebooting http://127.0.0.1:5984/node-metrics/ -> http://1.2.3.5:5984/node-metrics/
from last known replicati\
on checkpoint
[error] [<0.21122.1>] ** Generic server <0.21122.1> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.27919.2>,<0.27923.2>,<0.27926.2>,
--
[info] [<0.16437.3>] 1.2.3.4 - - 'GET' /service-metrics/_design/views/_view/allmetrics
200
[error] [<0.256.0>] couch_rep_httpc request failed after 10 retries: http://1.2.3.5:5984/service-metrics/
[error] [<0.256.0>] ** Generic server <0.256.0> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.15441.3>,<0.15446.3>,<0.15448.3>,
--
    reductions: 4021
  neighbours:
[error] [<0.15448.3>] ** Generic server <0.15448.3> terminating
** Last message in was {'EXIT',<0.15449.3>,
                           {{http_request_failed,
--
                  {child_type,worker}]

[error] [<0.15441.3>] ** Generic server <0.15441.3> terminating
** Last message in was {'EXIT',<0.256.0>,
                           {http_request_failed,
--
  neighbours:
[error] [<0.9022.3>] couch_rep_httpc request failed after 10 retries: http://1.2.3.5:5984/node-metrics/
[error] [<0.9022.3>] ** Generic server <0.9022.3> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.15170.3>,<0.15174.3>,<0.15177.3>,
--
                  {child_type,worker}]

[error] [<0.15174.3>] ** Generic server <0.15174.3> terminating
** Last message in was {'EXIT',<0.9022.3>,
                           {http_request_failed,
--
    reductions: 7005
  neighbours:
[error] [<0.15177.3>] ** Generic server <0.15177.3> terminating
** Last message in was {'EXIT',<0.9022.3>,
                           {http_request_failed,
--
                  {stack_size,15},
                  {reductions,5200}]
[error] [<0.15170.3>] ** Generic server <0.15170.3> terminating
** Last message in was {'EXIT',<0.9022.3>,
                           {http_request_failed,
--
[info] [<0.31343.3>] 127.0.0.1 - - 'POST' /service-metrics/_ensure_full_commit?seq=292618
201
[info] [<0.18230.3>] rebooting http://127.0.0.1:5984/service-metrics/ -> http://1.2.3.5:5984/service-metrics/
from last known rep\
lication checkpoint
[error] [<0.18230.3>] ** Generic server <0.18230.3> terminating
** Last message in was {'$gen_cast',do_checkpoint}
** When Server state == {state,<0.31889.3>,<0.31894.3>,<0.31896.3>,
--


Fredrik Widlund, CSO / Chief Architect, Qbrick
Direct: +46 8 459 90 32 | Mobile: +46 76 899 96 66

Södra Hamnvägen 22 | 115 41 STOCKHOLM
Web and mobile: www.qbrick.com

-----Ursprungligt meddelande-----
Från: Randall Leeds (JIRA) [mailto:jira@apache.org]
Skickat: den 1 april 2010 21:12
Till: Fredrik Widlund
Ämne: [jira] Commented: (COUCHDB-722) Continuous replication tasks fail


    [ https://issues.apache.org/jira/browse/COUCHDB-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852510#action_12852510
]

Randall Leeds commented on COUCHDB-722:
---------------------------------------

I'm rather confused.

The compaction seems to be on the service-metrics database, but the replication is between
databases named node-metrics.
However, there's a POST to /service-metrics/_missing_revs on the target database right around
the time compaction completes. Replication performs this operation. Are you using vhosts or
some kind of proxy layer that's rewriting any of your requests? Could you include a little
bit more context at the end where you put the ...? In particular I want to know if the replication
was using the service-metrics database at all.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




> Continuous replication tasks fail
> ---------------------------------
>
>                 Key: COUCHDB-722
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-722
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: Arch Linux, CouchDB 0.11
>            Reporter: Fredrik Widlund
>
> Couchdb 0.11.0 replication tasks fails with the below after working for everything from
a few minutes to an hour. The below replication is of the type {"source":"http://127.0.0.1:5984/node-metrics",
"target":"http://1.2.3.4:5984/node-metrics", "continuous":true} and the node-metrics database
exist on both machines.
> The database is periodically compacted which, and I'm speculating here, could be a contributing
factor to the crash.
> Kind regards,
> Fredrik Widlund
> =CRASH REPORT==== 1-Apr-2010::14:25:26 ===
>   crasher:
>     initial call: couch_rep:init/1
>     pid: <0.274.0>
>     registered_name: []
>     exception exit: {{badmatch,
>                          {stop,
>                              {db_not_found,
>                                  <<"http://127.0.0.1:5984/node-metrics/">>}}},
>                      [{couch_rep,do_checkpoint,1},
>                       {couch_rep,handle_cast,2},
>                       {gen_server,handle_msg,5},
>                       {proc_lib,init_p_do_apply,3}]}
>       in function  gen_server:terminate/6
>     ancestors: [couch_rep_sup,couch_primary_services,couch_server_sup,
>                   <0.32.0>]
>     messages: [{'EXIT',<0.21084.1>,normal}]
>     links: [<0.81.0>]
>     dictionary: [{task_status_update,{{1270,124726,124009},0}}]
>     trap_exit: true
>     status: running
>     heap_size: 10946
>     stack_size: 24
>     reductions: 29173458
>   neighbours:
> [error] [<0.81.0>] {error_report,<0.31.0>,
>     {<0.81.0>,supervisor_report,
>      [{supervisor,{local,couch_rep_sup}},
>       {errorContext,child_terminated},
>       {reason,
>           {{badmatch,
>                {stop,
>                    {db_not_found,<<"http://127.0.0.1:5984/node-metrics/">>}}},
>            [{couch_rep,do_checkpoint,1},
>             {couch_rep,handle_cast,2},
>             {gen_server,handle_msg,5},
>             {proc_lib,init_p_do_apply,3}]}},
>       {offender,
>           [{pid,<0.274.0>},
>            {name,"f3e3081db5a215dbaf9b2984f0552090+continuous"},
>            {mfa,
>                {gen_server,start_link,
>                    [couch_rep,
>                     ["f3e3081db5a215dbaf9b2984f0552090",
>                      {[{<<"target">>,
>                         <<"http://1.2.3.4:5984/node-metrics">>},
>                        {<<"source">>,<<"http://127.0.0.1:5984/node-metrics">>},
>                        {<<"continuous">>,true}]},
>                      {user_ctx,null,
>                          [<<"_admin">>],
>                          <<"{couch_httpd_auth, default_authentication_handler}">>}],
>                     []]}},
>            {restart_type,temporary},
>            {shutdown,1},
>            {child_type,worker}]}]}}
> =SUPERVISOR REPORT==== 1-Apr-2010::14:25:26 ===
>      Supervisor: {local,couch_rep_sup}
>      Context:    child_terminated
>      Reason:     {{badmatch,
>                       {stop,
>                           {db_not_found,
>                               <<"http://127.0.0.1:5984/node-metrics/">>}}},
>                   [{couch_rep,do_checkpoint,1},
>                    {couch_rep,handle_cast,2},
>                    {gen_server,handle_msg,5},
>                    {proc_lib,init_p_do_apply,3}]}
>      Offender:   [{pid,<0.274.0>},
>                   {name,"f3e3081db5a215dbaf9b2984f0552090+continuous"},
>                   {mfa,
>                       {gen_server,start_link,
>                           [couch_rep,
>                            ["f3e3081db5a215dbaf9b2984f0552090",
>                             {[{<<"target">>,
>                                <<"http://1.2.3.4:5984/node-metrics">>},
>                               {<<"source">>,
>                                <<"http://127.0.0.1:5984/node-metrics">>},
>                               {<<"continuous">>,true}]},
>                             {user_ctx,null,
>                                 [<<"_admin">>],
>                                 <<"{couch_httpd_auth, default_authentication_handler}">>}],
>                            []]}},
>                   {restart_type,temporary},
>                   {shutdown,1},
>                   {child_type,worker}]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message