couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Okstad <poks...@gmail.com>
Subject Re: The state of filtered replication
Date Wed, 25 May 2016 16:39:21 GMT
This isn’t just a problem of filtered replication, it’s a major issue in the database-per-user
strategy (at least in the v1.6.1 I’m using). I’m also using a database-per-user design
with thousands of users and a single global database. If a small fraction of the users (hundreds)
has continuously ongoing replications from the user DB to the global DB, it will cause extremely
high CPU utilization. This is without any replication filtered javascript function.

Another huge issue with filtered replications is that they lose their place when replications
are restarted. In other words, they don’t keep track of sequence ID between restarts of
the server or stopping and starting the same replication. So for example, if I want to perform
filtered replication of public documents from the global DB to the public DB, and I have a
ton of documents in global, then each time I restart the filtered replication it will begin
from sequence #1. I’m guessing this is due to the fact that CouchDB does not know if the
filter function has been modified between replications, but this behavior is still very disappointing.

— 
Paul Okstad
http://pokstad.com <http://pokstad.com/>



> On May 25, 2016, at 4:25 AM, Stefan Klein <st.fankl.in@gmail.com> wrote:
> 
> 2016-05-25 12:48 GMT+02:00 Stefan du Fresne <stefan@medicmobile.org>:
> 
> 
> 
>> So to be clear, this is effectively replacing replication— where the
>> client negotiates with the server for a collection of changes to download—
>> with a daemon that builds up a collection of documents that each client
>> should get (and also presumably delete), which clients can then query for
>> when they’re able?
>> 
> 
> Sorry, didn't describe well enough.
> 
> On Serverside we have one big database containing all documents and one db
> for each user.
> The clients always replicate to and from their individual userdb,
> unfiltered. So the db for a user is a 1:1 copy of their pouchdb/... on
> their client.
> 
> Initially we set up a filtered replication for each user from servers main
> database to the server copy of the users database.
> With this we ran into performance problems and sooner or later we probably
> would have ran into issues with open file descriptors.
> 
> So what we do instead is listening to the changes of the main database and
> distribute the documents to the servers userdb, which then are synced with
> the clients.
> 
> Note: this is only for documents the users actually work with (as in
> possibly modify), for queries on the data we query views on the main
> database.
> 
> For the way back, we listen to the _dbchanges, so we get an event for
> changes on the users dbs, get that change from the users db and determine
> what to do with it.
> We do not replicate back users changes to the main database but rather have
> an internal API to evaluate all kinds of constrains on users input.
> If you do not have to check users input, you could certainly listen to
> _dbchanges and "blindly" one-shot replicate from the changed DB to your
> main DB.
> 
> -- 
> Stefan


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message