couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randall Leeds (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-1230) Replication slows down over time
Date Thu, 21 Jul 2011 20:08:57 GMT


Randall Leeds commented on COUCHDB-1230:

If you have the ability and time to compile and test trunk in the same way, I would very much
appreciate it. Filipe overhauled the replication code after 1.1 branched. It uses connection
pooling much more sensibly and it's easier to reason about.

Just make sure not to upgrade in place since the database will get upgraded in a way that
is not backwards-compatible.
Perhaps if you can set up a new target we'll know whether this is reproducible now or out-dated.

> Replication slows down over time
> --------------------------------
>                 Key: COUCHDB-1230
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.0.2, 1.1
>         Environment: Ubuntu 10.04, 
>            Reporter: Paul Hirst
>         Attachments: sequence_number.png
> I have two databases which were replicated in the past, one is running 1.0.2. I shall
call this the source database. The other is running 1.1.0, I shall call this the target database.
> The source and target are bidirectionally replicated using a push and pull replication
from the target (using a couple of documents in the new _replicator database).
> The source database is in production and is getting changes applied to it from live systems.
The target is only participating in replication and isn't being used directly by any production
> The database has about 50 million documents many of these will have been updated a handful
of times. The database is about 500G after compaction, but the source database is currently
at about 900G as it hasn't been compacted for a while.
> The databases were replicated in the past however this replication was torn down when
the target was upgraded from 1.0.2 to 1.1.0. When replication was reenabled the system wasn't
able to pick up were it left off and had to reenumerate all the documents again. This process
initially started quickly but after a while ground to a halt such that the target actually
stopped making progress against the source database.
> I found that restarting replication starts the process running again at a decent speed
for a while. I did this by deleting and recreating the appropriate document in the _replicator
database on the target.  
> I have graphed the last_seq of the target database against time for about a day, noting
when replication was manually restarted. I shall try to attach the graph if possible. It shows
a clear improvement in replication speed after restarting replication.
> I previously witnessed this behaviour between 1.0.2 databases but didn't grab any stats
at the time but I don't think it's a new problem.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message