incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Westlake <nicholasred...@gmail.com>
Subject Re: Limits on Continuous Replication
Date Thu, 10 Oct 2013 07:44:10 GMT
Is "serious punishment for restarting couch" the only problem with this design?

-NRW

On 10 Oct 2013, at 2:31 AM, Tibor Gemes <tibber@gmail.com> wrote:

> I see one problem. The startup case.
> 
> Let's suppose that the couchdb is shut down properly. On startup it will
> scan all _replicator documents and for each document it will start a
> filtered replication. Each filtered replication will start scan the full
> _changes feed and execute the filter for each item in the feed, and if the
> item passes then it will be checked on the target db, and if it is fresher
> then will be replicated. In normal cases the item won't be newer so the
> replication won't happen, however for each replicator document the whole
> _changes feed will be read and filtered still. This will cause a huge
> stress on the system.
> 
> Hth,
> Tib
> 
> On 10 October 2013 09:18, Nicholas Westlake <nicholasredlin@gmail.com>wrote:
> 
>> I was advised in #couchdb that my question is better suited to the mailing
>> list.
>> 
>> I'm looking at an app design that uses quite a few replications. I'm
>> wondering what limits I'll run into from couch. Ideas for how to work
>> around those limits would be bonus. :)
>> 
>> I have 3 kinds of database:
>> 
>> 1) user # One of these for each user (readable and writeable only by the
>> user who owns it)
>> 2) project-public # One of these for each project (writeable only by the
>> replicator, readable by everyone)
>> 3) project-admin # One of these for each project (writeable only by the
>> replicator, readable only by users granted permission)
>> 
>> When a user is added to a project, filtered continuous replication starts
>> from their "user" database to both the "project-public" and "project-admin"
>> databases. The filter functions check for simple conditions like
>> "doc.committed === true". The replication will be stored in _replicator
>> database so the continuous replication will survives a restart.
>> 
>> The important behaviors:
>> 
>> - When a user changes data and it should propagate to the the project
>> databases in under 30 seconds.
>> - Project data should be access controlled between public data and
>> admin-only data. I believe separate databases with _security set
>> accordingly is the only way to achieve this.
>> - Restarting couch shouldn't break anything.
>> 
>> Some realistic maximums. It's unlikely that:
>> 
>> - any "project-admin" or "project-public" databases will exceed 8
>> megabytes, compacted.
>> - any "user" database will exceed 100 megabytes each, compacted.
>> - the user count, and project count will ever exceed 100,000 and 40,000
>> respectively.
>> - more than 4,000 users will work on any single project.
>> - more than 2,000 users will be making changes at any given moment.
>> 
>> Just in case it matters:
>> 
>> - The external process that will be creating databases and entries in the
>> _replicator database is node.js.
>> 
>> The continuous replication count (at most) would be 320 million (40,000
>> projects * 4,000 users/project * 2 replications/user). This seems like it
>> would break something. :)
>> 
>> Should this work? Or: what's the canonical (or a good) alternative to this
>> design with couch?


Mime
View raw message