couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-416) Replicating shards into a single aggregation node may cause endless respawning
Date Sat, 15 Aug 2009 02:22:15 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743523#action_12743523
] 

Adam Kocoloski commented on COUCHDB-416:
----------------------------------------

Hi Enda, good sleuthing.  Trunk now throws a db_not_found exception if the source and/or target
DB does not exist.  We should probably clean up the error message that gets propagated to
the client, but now it won't respawn like mad and blow up logs.

I'm fairly certain that the missing DB was the problem.  Multiple sources replicating to the
same target should work just fine.  If you give the OK I'll close this ticket.

> Replicating shards into a single aggregation node may cause endless respawning
> ------------------------------------------------------------------------------
>
>                 Key: COUCHDB-416
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-416
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: couchdb 0.9.0.r766883 CentOS x86_64
>            Reporter: Enda Farrell
>            Assignee: Adam Kocoloski
>            Priority: Critical
>         Attachments: Picture 2.png
>
>
> I have a set of CouchDB instances, each one acting as a shard for a large set of data.
> Ocassionally, we replicate each instances' database into a different CouchDB instance.
We always "pull" replicate (see image attached)
> When we do this, we often see errors like this on the target instance:
> * [Thu, 16 Jul 2009 13:52:32 GMT] [error] [emulator] Error in process <0.29787.102>
with exit value: {function_clause,[{lists,map,[#Fun<couch_rep.6.75683565>,undefined]},{couch_rep,enum_docs_since,4}]}
> * 
> * 
> * 
> * [Thu, 16 Jul 2009 13:52:32 GMT] [error] [<0.7456.6>] replication enumerator exited
with {function_clause,
> *                                     [{lists,map,
> *                                       [#Fun<couch_rep.6.75683565>,undefined]},
> *                                      {couch_rep,enum_docs_since,4}]} .. respawning
> Once this starts, it is fatal to the CouchDB instance. It logs these messages at over
1000 per second (log level = severe) and chews up HDD.
> No errors (other than a HTTP timeout) are seen.
> After a database had gone "respawning",  the target node was shutdown, logs cleared,
target node restarted. Log was tailed - all was quiet. Once a single replication was called
again against this database it again immediatly went into respawning hell. There were no stacked
replications in this case.
> From this it seems that - if a database ever goes into "respawning" it cannot recover
(when your enviroment/setup requires replication to occur always).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message