couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randall Leeds (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-597) Replication tasks crash.
Date Sun, 07 Feb 2010 21:04:27 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830770#action_12830770
] 

Randall Leeds commented on COUCHDB-597:
---------------------------------------

Seems totally possible to me that it would take CouchDB more than a second to respond with
headers. I've seen it take CouchDB more than a second to respond to / with the welcome message
when load is high enough.

If attachments are downloaded over the same connection as documents I don't really see how
we can get any better than cranking the timeout and praying it's enough for most cases. If
attachments had their own connection then the primary changes feed could heartbeat continuous
replications, but I don't like this solution much.

Is there any reason a receiver needs a timeout at all for this? If the source process dies
or CouchDB dies entirely on the other end we should get an error reading from the socket.
If we don't, who are we to assume that the source isn't just slow? Doesn't seem like there
should be a timeout at all to me.

> Replication tasks crash.
> ------------------------
>
>                 Key: COUCHDB-597
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-597
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.11
>            Reporter: Robert Newson
>
> If I kick off 10 replication tasks in quick succession, occasionally one or two of the
replication tasks will die and not be resumed. It seems that the stat tracking is a little
buggy, and under stress can eventually cause a permanent failure of the supervised replication
task;
> [Fri, 11 Dec 2009 19:00:08 GMT] [error] [<0.80.0>] {error_report,<0.30.0>,
>     {<0.80.0>,supervisor_report,
>      [{supervisor,{local,couch_rep_sup}},
>       {errorContext,shutdown_error},
>       {reason,killed},
>       {offender,
>           [{pid,<0.6700.11>},
>            {name,"fcbb13200a1618cf983b347f4d2c9835+create_target"},
>            {mfa,
>                {gen_server,start_link,
>                    [couch_rep,
>                     ["fcbb13200a1618cf983b347f4d2c9835",
>                      {[{<<"create_target">>,true},
>                        {<<"source">>,<<"http://node:5984/perf-p2">>},
>                        {<<"target">>,<<"perf-p2">>}]},
>                      {user_ctx,null,[<<"_admin">>]}],
>                     []]}},
>            {restart_type,temporary},
>            {shutdown,1},
>            {child_type,worker}]}]}}
> [Fri, 11 Dec 2009 19:00:08 GMT] [error] [emulator] Error in process <0.6705.11>
with exit value: {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_os_files},-1}]},{couch_stats_collector,decrement,1}]}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message