couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-416) Replicating shards into a single aggregation node may cause endless respawning
Date Thu, 16 Jul 2009 16:33:15 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732010#action_12732010
] 

Adam Kocoloski commented on COUCHDB-416:
----------------------------------------

Hi Enda, thanks for this report.  There should be some lines in the logs on the remote servers
around the time that this occurs that correspond to requests for

/labuk_braintestbritian/_all_docs_by_seq?limit=100&startkey=...

Can you check those logs to see if there was anything "out of the ordinary" -- e.g. a status
code that was not 200?



> Replicating shards into a single aggregation node may cause endless respawning
> ------------------------------------------------------------------------------
>
>                 Key: COUCHDB-416
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-416
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: couchdb 0.9.0.r766883 CentOS x86_64
>            Reporter: Enda Farrell
>            Assignee: Adam Kocoloski
>            Priority: Critical
>         Attachments: Picture 2.png
>
>
> I have a set of CouchDB instances, each one acting as a shard for a large set of data.
> Ocassionally, we replicate each instances' database into a different CouchDB instance.
We always "pull" replicate (see image attached)
> When we do this, we often see errors like this on the target instance:
> * [Thu, 16 Jul 2009 13:52:32 GMT] [error] [emulator] Error in process <0.29787.102>
with exit value: {function_clause,[{lists,map,[#Fun<couch_rep.6.75683565>,undefined]},{couch_rep,enum_docs_since,4}]}
> * 
> * 
> * 
> * [Thu, 16 Jul 2009 13:52:32 GMT] [error] [<0.7456.6>] replication enumerator exited
with {function_clause,
> *                                     [{lists,map,
> *                                       [#Fun<couch_rep.6.75683565>,undefined]},
> *                                      {couch_rep,enum_docs_since,4}]} .. respawning
> Once this starts, it is fatal to the CouchDB instance. It logs these messages at over
1000 per second (log level = severe) and chews up HDD.
> No errors (other than a HTTP timeout) are seen.
> After a database had gone "respawning",  the target node was shutdown, logs cleared,
target node restarted. Log was tailed - all was quiet. Once a single replication was called
again against this database it again immediatly went into respawning hell. There were no stacked
replications in this case.
> From this it seems that - if a database ever goes into "respawning" it cannot recover
(when your enviroment/setup requires replication to occur always).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message