couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mathias Leppich (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1359) Spurious "checkpoint failure: conflict (are you replicating to yourself?)"
Date Fri, 16 Dec 2011 09:04:30 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170840#comment-13170840
] 

Mathias Leppich commented on COUCHDB-1359:
------------------------------------------

I ran into the same issue. I was replicating from couch-A (version 1.0.3) to couch-B (version
1.1.1) via couch-B:

    curl -X POST -H 'Content-Type: application/json' -d '{
     "source":"some-db", 
     "target":"http://couch-A/some-db",
     "filter":"filters_erl/no_design",
     "continuous":true
    }' 'http://couch-B/_replicate'

Then suddenly (after more than 2 days of replication) I got the following error in the log:
   [Thu, 15 Dec 2011 19:10:00 GMT] [error] [<0.259.0>] checkpoint failure: conflict
(are you replicating to yourself?)

And the replication stopped replicating but still showing up in _active_tasks. It did not
started replicating again until I canceled the replication and re-initiated it again. So no
couchrestart required, but I had to first cancel then restart the replication.

I might have to add that even thou these 2 couches have a different version, they are both
based on the same database file. So couch-B was created from a filesystem snapshot of couch-A.
The size of both DB's is about 52M doc with 70M seq.
                
> Spurious "checkpoint failure: conflict (are you replicating to yourself?)"
> --------------------------------------------------------------------------
>
>                 Key: COUCHDB-1359
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1359
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1.1
>         Environment: Centos 5.6/x64 - spidermonkey 1.8.5, couch 1.1.1 patched for COUCHDB-1333
and COUCHDB-1340
>            Reporter: Alex Markham
>              Labels: PUT, _local, checkpoint, conflict, replication, slow
>
> I'm seeing these errors in the log when couch just stops replicating (even though it
appears in _active_tasks it doesn't checkpoint again, even with _replicate being called every
5 mins)
> It seems to occur when replicating from a couch 1.1.1 (I have seen it on 1.0.3 machines
replicating from 1.1.1)
> It definitely is not replicating to itself, but I suspect it is a problem in PUTing the
_local doc on the source db.
> log here (snipped from host33 couch.log): http://www.friendpaste.com/3FLgRFzOEAkkKazLbc7Jgw

> for that log our replication cron does an ssh to host33, then curls it to replicate from
host01 to the database (with no host specified) as coninuous pull replication
> We have occasionally seen slow PUTing of documents on that database (and only that database)
which can take upwards of 10 seconds (via futon or our app) as it is a creaking database that
has a scarred history of documents that contain many (thousands) of conflicts.
> Could this occasional slow PUT manifest itself as this error in the log?
> As a workaround to keep replication flowing, would it restart this replication id if
the curl called the cancelling of the replication ("cancel":true) followed by the starting
of replication?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message