couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Vander Wilt <>
Subject Re: Round-robin replication [was Half-baked idea: incremental virtual databases]
Date Tue, 05 Feb 2013 18:38:45 GMT
+1 on round-robin continuous replication. Ideally it'd allow replications to specify a relative
priority, e.g. replication of log alerts or chat messages might desire lower latency (higher
priority) than a ddoc deployment or user backup. For now I'm going to just implement my own
duct tape version of this, using cron jobs to trigger non-continuous replications.

FWIW, I'm sharing with my client's permission the script I've been using to load test continuous
filtered replication to/from a central master:

The test script sets up N+1 databases, writes documents to 1 as the master while replicating
to the other N as well as "short-polling" the _changes to kinda simulate general load on top
of the longpolling the application does. On OS X I can only get to around 250 users due to
some FD_SETSIZE stuff with Erlang, but it remains stable if I keep it under that limit —
however, it takes the user database replications a *long* time to all get caught up (some
don't even start until a few minutes after the changes stop).


On Feb 4, 2013, at 2:50 PM, Robert Newson wrote:

> I had a mind to teach the _replicator db this trick. Since we have a
> record of everything we need to resume a replication there's no reason
> for a one-to-one correspondence between a _replicator doc and a
> replicator process. We can simply run N of them for a bit (say, a
> batch of 1000 updates) and then switch to others. The internal
> db_updated mechanism is a good way to notice when we might have
> updates worth sending but it's only half the story. A round-robin over
> all _replicator docs (other than one-shot ones, of course) seems a
> really neat trick to me.
> B.
> On 4 February 2013 22:39, Jan Lehnardt <> wrote:
>> On Feb 4, 2013, at 23:14 , Nathan Vander Wilt <> wrote:
>>> On Jan 29, 2013, at 5:53 PM, Nathan Vander Wilt wrote:
>>>> So I've heard from both hosting providers that it is fine, but also managed
to take both of their shared services "down" with only about ~100 users (200 continuous filtered
replications). I'm only now at the point where I have tooling to build out arbitrary large
tests on my local machine to see the stats for myself, but as I understand it the issue is
that every replication needs at least one couchjs process to do its filtering for it.
>>>> So rather than inactive users mostly just taking up disk space, they're instead
costing a full-fledged process worth of memory and system resources, each, all the time. As
I understand it, this isn't much better on BigCouch either since the data is scattered ±
evenly on each machine, so while the *computation* is spread, each node in the cluster still
needs k*numberOfUsers couchjs processes running. So it's "scalable" in the sense that traditional
databases are scalable: vertically, by buying machines with more and more memory.
>>>> Again, I am still working on getting a better feel for the costs involved,
but the basic theory with a master-to-N hub is not a great start: every change needs to be
processed by every N replications. So if a user writes a document that ends up in the master
database, every other user's filter function needs to process that change coming out of master.
Even when N users are generating 0 (instead of M) changes, it's not doing M*N work but there's
still always 2*N open connections and supporting processes providing a nasty baseline for
large values of N.
>>> Looks like I was wrong about needing enough RAM for one couchjs process per replication.
>>> CouchDB maintains a pool of (no more than query_server_config/os_process_limit)
couchjs processes and work is divvied out amongst these as necessary. I found a little meta-discussion
of this system at and the code uses it
>>> On my laptop, I was able to spin up 250 users without issue. Beyond that, I start
running into ± hardcoded system resource limits that Erlang has under Mac OS X but from what
I've seen the only theoretical scalability issue with going beyond that on Linux/Windows would
be response times, as the worker processes become more and more saturated.
>>> It still seems wise to implement tiered replications for communicating between
thousands of *active* user databases, but that seems reasonable to me.
>> CouchDB’s design is obviously lacking here.
>> For immediate relief, I’ll propose the usual jackhammer of unpopular responses:
write your filters in Erlang. (sorry :)
>> For the future: we already see progress in improving the view server situation. Once
we get to a more desirable setup (yaynode/v8), we can improve the view server communication,
there is no reason you’d need a single JS OS process per active replication and we should
absolutely fix that.
>> --
>> Another angle is the replicator. I know Jason Smith has a prototype of this in Node,
it works. Instead of maintaining N active replications, we just keep a maximum number of active
connections and cycle out ones that are currently inactive. The DbUpdateNotification mechanism
should make this relatively straightforward. There is added overhead for setting up and tearing
down replications, but we can make better use of resources and not clog things with inactive
replications. Especially in a db-per-user scenario, most replications don’t see much of
an update most of the time, they should be inactive until data is written to any of the source
databases. The mechanics in CouchDB are all there for this, we just need to write it.
>> --
>> Nate, thanks for sharing our findings and for bearing with us, despite your very
understandable frustrations. It is people like you that allow us to make CouchDB better!
>> Best
>> Jan
>> --

View raw message