incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <j...@couchbase.com>
Subject Re: Scaling with filtered replication
Date Tue, 09 Jul 2013 18:43:11 GMT

On Jul 9, 2013, at 11:09 AM, Robert Newson <rnewson@apache.org> wrote:

> If you didn’t have filters at all, but still had n^2 replications, you've still got
a scaling problem, it's just not directly related to the filtering overhead.

Yes, I agree that CouchDB filtering is not significantly higher-CPU than not filtering :)
and likely cheaper if you include the savings from not transmitting the filtered-out revisions.

But if you _do_ filter heavily, so any one client is seeing only a small fraction of the total
update traffic, the filtering overhead starts to dominate as the number of clients grows.
Because the server is still fetching, decoding and running a JS function on (say) 100 or 1000
rejected documents for every one that does get sent. That’s a pretty typical scenario for
a system with mobile or desktop clients — think of Exchange or SalesForce.com or Words With
Friends; what fraction of the total server-side updates does any one client see?

The alternative is the hypothetical view-based filtering that’s been talked about here before,
where the source db would iterate over a pre-filtered list of revisions from a view index
rather than going through the entire by-sequence index. Or the actual-but-alpha-quality “channels”
mechanism we’re using in the Couchbase Sync Gateway.

Anyway. I’m not meaning to harsh on filtering in general, and in the OP’s case it sounds
like the target databases are corporate customers rather than end-users, so there probably
aren’t nearly as many of them as in the scenarios I’m talking about.

—Jens
Mime
View raw message