couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipe David Manana <>
Subject Re: CouchDB 1.2 Replicator Changes
Date Thu, 26 Apr 2012 12:00:50 GMT
On Wed, Apr 25, 2012 at 7:53 PM, Chris Stockton <> wrote:
> Hello,

Hi Chris

> I was very excited when I was reading the replicator changes in the
> release notes[1], specifically because I saw "Number of worker
> processes" and thought that maybe it is now pooled. Although I am very
> glad to see the improvements to the replicator and very much appreciate
> the work that has been done to it; I am a bit confused after reading the
> more detailed paramters for the new replicator[2]. It seems that the
> configuration options and worker processes are for a specific database,
> with some decently high defaults, such as 20 "http_connections". From
> what I gather reading this is per database, or is it per server?

It's per replication (what you call "per database" if I understood
correctly). It's specifically mentioned at

> I have sent emails in the past to this list how I would love to see a
> server wide replicator, something that created a configurable pool of
> connections for server relationships. For us, we scale with many
> databases instead of having one giant database. The problem we have
> faced is as we reached only 2K databases some configuration tweaks had
> to be made to allow replication to run from our Master -> Failover ->
> Backup machine, as we got up to 5000 we were forced to take our Backup
> machine out of the picture due to putting around 10K TCP connection
> requirement to our Fail over machine. It was simply to much strain even
> for very large enterprise database servers.
> So my question here is does the new replicator pool an entire server,
> solving our growth problem with MANY databases, or does it simply add
> additional strain with more workers (from 5000 tcp connections to 100k)?
> If it does indeed add additional workers instead of lower them, if I was
> to lower the defaults to 1 connection per database, is the new
> replicator designed in such a way that it will still offer at least
> comparable performance to the 1.1 replicator, or could I possibly incur
> a penalty because the new architecture is designed and expected to have
> a modest pool size?

A big difference is that each replication has its own set of dedicated
connections (for better error isolation and performance).

Keep in mind however, that if you're doing pull-style replications,
there's always one connection (per replication) fully dedicated to the
remote _changes feed. This is true for  both replicators.

I think you'll to solve your problem by having only N non-continuous
replications active at any time, and do your own round-robin
scheduling manually in the meanwhile.

I started some time ago some work to make the pooling more
configurable, namely to allow to choose between a per-replication
dedicated pool or a shared pool of connections amongst multiple
replications, amongst other features. It's not finished however, and
only the following is online:


> Kind Regards,
> -Chris
> [1]
> [2]

Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

View raw message