couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <j...@couchbase.com>
Subject Re: Scaling with filtered replication
Date Fri, 12 Jul 2013 18:58:58 GMT

On Jul 11, 2013, at 9:25 AM, Bill Foshay <bill.foshay@noteandgo.com> wrote:

> Ignoring filtering, is there any idea roughly how many persistent 
> replications can be running before it starts to hurt performance. I know 
> this is a vague question, highly dependent on the system resources of the 
> machine hosting the database, the number of updates being made, etc. I'm 
> just trying to get a rough idea if possible. Are we talking like on the 
> order of 100 replications, 1000s, etc? 

Client pushes aren’t expensive. They don’t consume resources on the server except for
the occasional POSTs to _revs_diff and _bulk_docs when the client has new revisions to upload.

The overhead of client pulls is
* Open TCP sockets (for _changes feeds) — this is the same scaling problem that large-scale
Comet, IMAP, XMPP, etc. servers have. There are hardware and kernel issues to consider if
you need to handle tends/hundreds of thousands of open TCP connections per host.
* User-space server state for all those connections — fortunately Erlang is kind of the
poster child of scalability here. I don’t know what extra overhead CouchDB adds.
* Writing to all those sockets whenever a revision is added — Don’t know how bad this
gets. It’s on the order of one packet of payload, per active listener. In response, there
will probably be a GET request sent from each listener to retrieve the new revision. In extreme
cases the GETs could result in the same thundering-herd problem that was seen with RSS (where
updating the feed produces a zillion simultaneous hits to the new article.)

I don’t know much about the innards of CouchDB (or Erlang) so I can’t get more specific
about these…

—Jens
Mime
View raw message