incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipe David Manana <fdman...@apache.org>
Subject Re: CouchDB 1.2 Replicator Changes
Date Thu, 26 Apr 2012 12:00:50 GMT
On Wed, Apr 25, 2012 at 7:53 PM, Chris Stockton <cstockton@godaddy.com> wrote:
> Hello,

Hi Chris

>
> I was very excited when I was reading the replicator changes in the
> release notes[1], specifically because I saw "Number of worker
> processes" and thought that maybe it is now pooled. Although I am very
> glad to see the improvements to the replicator and very much appreciate
> the work that has been done to it; I am a bit confused after reading the
> more detailed paramters for the new replicator[2]. It seems that the
> configuration options and worker processes are for a specific database,
> with some decently high defaults, such as 20 "http_connections". From
> what I gather reading this is per database, or is it per server?

It's per replication (what you call "per database" if I understood
correctly). It's specifically mentioned at
http://wiki.apache.org/couchdb/Replication#New_features_introduced_in_CouchDB_1.2.0

>
> I have sent emails in the past to this list how I would love to see a
> server wide replicator, something that created a configurable pool of
> connections for server relationships. For us, we scale with many
> databases instead of having one giant database. The problem we have
> faced is as we reached only 2K databases some configuration tweaks had
> to be made to allow replication to run from our Master -> Failover ->
> Backup machine, as we got up to 5000 we were forced to take our Backup
> machine out of the picture due to putting around 10K TCP connection
> requirement to our Fail over machine. It was simply to much strain even
> for very large enterprise database servers.
>
> So my question here is does the new replicator pool an entire server,
> solving our growth problem with MANY databases, or does it simply add
> additional strain with more workers (from 5000 tcp connections to 100k)?
> If it does indeed add additional workers instead of lower them, if I was
> to lower the defaults to 1 connection per database, is the new
> replicator designed in such a way that it will still offer at least
> comparable performance to the 1.1 replicator, or could I possibly incur
> a penalty because the new architecture is designed and expected to have
> a modest pool size?

A big difference is that each replication has its own set of dedicated
connections (for better error isolation and performance).

Keep in mind however, that if you're doing pull-style replications,
there's always one connection (per replication) fully dedicated to the
remote _changes feed. This is true for  both replicators.

I think you'll to solve your problem by having only N non-continuous
replications active at any time, and do your own round-robin
scheduling manually in the meanwhile.

I started some time ago some work to make the pooling more
configurable, namely to allow to choose between a per-replication
dedicated pool or a shared pool of connections amongst multiple
replications, amongst other features. It's not finished however, and
only the following is online:

https://github.com/fdmanana/couchdb/tree/lhttpc

regards


>
> Kind Regards,
>
> -Chris
>
> [1]
> http://www.apache.org/dist/couchdb/notes/1.2.0/apache-couchdb-1.2.0.html
> [2] http://wiki.apache.org/couchdb/Replication
>
>

-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Mime
View raw message