incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <>
Subject Re: Scaling with filtered replication
Date Tue, 09 Jul 2013 15:50:40 GMT
It's not true. Passing replication through a filter is a linear
slowdown (the cost of passing the document to spidermonkey for
evaluation), nothing more. Filtered replication is as
incremental/resumable as non-filtered replication.

Your scalability challenge is that the number of persistent
connections, replicator processes and bandwidth requirements all grow
as your user base grows, but that's not related to filtering.

Part of the remedy for this should be for the CouchDB development
team, or a intrepid contributor, to teach the _replicator manager that
it doesn't need to run all the replications at the same time. Given
that the _replicator database contains all the information needed to
start the replication, the manager could run a configurable number of
replication jobs at a time, and rotate gradually through all the jobs
in the database. The downside will be increased latency for items to
replicate, of course.


On 9 July 2013 16:36, Bill Foshay <> wrote:
> I was reading somewhere recently that filtered replication with couchdb
> doesn't scale well and I was wondering if someone could verify whether this
> was true and if so, is there was a better way for us to architect our backend?
> Our company currently has a central couchdb on Iris Couch that houses all of
> our clients' data. Each of our clients also have their own couchdb, that
> replicates with this central db. The client dbs pull with a persistent
> filtered replication (so that they only pull their domain data, and they only
> pull the last two weeks worth of report data). They also have a persistent
> push replication set up to the central db. While the central db contains all
> domain and historical data, the individual client dbs only contain their
> domain data and the last two weeks worth of report related data. Each client
> generates about 5GB of data per year (roughly 100000 docs and 300000 doc
> updates). We only have a few clients at this point so we haven't really
> noticed any problems but if this design is going to have problems scaling, I'd
> rather hold off on sales and make changes now. Is there a flaw in this
> approach or a better way to do things? I appreciate any help or advice!
> Thanks,
> Bill

View raw message