couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <>
Subject [jira] Commented: (COUCHDB-416) Replicating shards into a single aggregation node may cause endless respawning
Date Thu, 16 Jul 2009 16:33:15 GMT


Adam Kocoloski commented on COUCHDB-416:

Hi Enda, thanks for this report.  There should be some lines in the logs on the remote servers
around the time that this occurs that correspond to requests for


Can you check those logs to see if there was anything "out of the ordinary" -- e.g. a status
code that was not 200?

> Replicating shards into a single aggregation node may cause endless respawning
> ------------------------------------------------------------------------------
>                 Key: COUCHDB-416
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: couchdb 0.9.0.r766883 CentOS x86_64
>            Reporter: Enda Farrell
>            Assignee: Adam Kocoloski
>            Priority: Critical
>         Attachments: Picture 2.png
> I have a set of CouchDB instances, each one acting as a shard for a large set of data.
> Ocassionally, we replicate each instances' database into a different CouchDB instance.
We always "pull" replicate (see image attached)
> When we do this, we often see errors like this on the target instance:
> * [Thu, 16 Jul 2009 13:52:32 GMT] [error] [emulator] Error in process <0.29787.102>
with exit value: {function_clause,[{lists,map,[#Fun<couch_rep.6.75683565>,undefined]},{couch_rep,enum_docs_since,4}]}
> * 
> * 
> * 
> * [Thu, 16 Jul 2009 13:52:32 GMT] [error] [<0.7456.6>] replication enumerator exited
with {function_clause,
> *                                     [{lists,map,
> *                                       [#Fun<couch_rep.6.75683565>,undefined]},
> *                                      {couch_rep,enum_docs_since,4}]} .. respawning
> Once this starts, it is fatal to the CouchDB instance. It logs these messages at over
1000 per second (log level = severe) and chews up HDD.
> No errors (other than a HTTP timeout) are seen.
> After a database had gone "respawning",  the target node was shutdown, logs cleared,
target node restarted. Log was tailed - all was quiet. Once a single replication was called
again against this database it again immediatly went into respawning hell. There were no stacked
replications in this case.
> From this it seems that - if a database ever goes into "respawning" it cannot recover
(when your enviroment/setup requires replication to occur always).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message