couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Holley <willhol...@gmail.com>
Subject Re: The state of filtered replication
Date Wed, 25 May 2016 15:45:24 GMT
Hi Stefan,

CouchDB 2.0 allows Mango selectors to be used as replication filters
(https://issues.apache.org/jira/browse/COUCHDB-2988) which offers
improvements over JavaScript-based filters. There are also many
improvements in the works around management of server-side
replications which should help if you end up with an architecture
where many per-user databases sync server-side with a "master"
database.

I've seen a number of Cloudant customers struggle with this problem
and it pretty much comes down to "avoid filtering over large sets of
documents". The 2 common strategies seem to be:

1. Manually sharding data (db per user / device)
2. Drive replication with something other than _changes

(1) can sometimes be done as a temporary database. For instance, you
may be able to create a database with, say "today's documents" using a
naming convention and have devices sync from that.

(2) might involve bootstrapping the replication since parameter (i.e.
capture a sequence number daily and start replication from that point
because you know there can be no documents for user A before that
date), or use a query to grab a set of documents and use them to
filter the replication or write your own replicator which uses these
doc _ids  instead of the _changes feed.

As Steve mentioned, both approaches have interesting edge cases around
dealing with documents that no longer pass a filter / match a query,
and you may need a strategy to remove documents from devices (PouchDB
doesn't support purging yet and you don't necessarily want deletions
to sync back to the server).

Cheers,

Will


On 25 May 2016 at 15:17, Steve Genoud <steve.genoud@gmail.com> wrote:
> Hi Stefan,
>
> There is a fork of CouchDB called Barrel that has support for view based
> replication: https://docs.barrel-db.org/docs/using-the-view-changes
>
> This seems to be mostly what you are looking for - a way to have a filter
> that is "cached" in some way. Note that it introduces some "interesting"
> edge cases (what happens when a document is removed from the view for
> instance) that you need to be aware of.
>
> Best,
> Steve
>
> On Wed, 25 May 2016 at 15:12 Varun Sikka <sikkavarun@gmail.com> wrote:
>
>> Hi Stefan,
>>
>> I ran into a similar issue, and I had to give up on the filtered
>> replication. I modified replication to use the doc_ids key and manually
>> send the doc_ids that I want to replicate.
>>
>> I am planning to move to the master - master - client db model, but again
>> on a low infrastructure, I am not very confident it could be managed. Am
>> really looking forward for a more robust solution to this problem of
>> Replication.
>>
>> Regards
>> Varun
>>
>> On Wed, May 25, 2016 at 3:25 PM, Stefan Klein <st.fankl.in@gmail.com>
>> wrote:
>>
>> > 2016-05-25 12:48 GMT+02:00 Stefan du Fresne <stefan@medicmobile.org>:
>> >
>> >
>> >
>> > > So to be clear, this is effectively replacing replication— where the
>> > > client negotiates with the server for a collection of changes to
>> > download—
>> > > with a daemon that builds up a collection of documents that each client
>> > > should get (and also presumably delete), which clients can then query
>> for
>> > > when they’re able?
>> > >
>> >
>> > Sorry, didn't describe well enough.
>> >
>> > On Serverside we have one big database containing all documents and one
>> db
>> > for each user.
>> > The clients always replicate to and from their individual userdb,
>> > unfiltered. So the db for a user is a 1:1 copy of their pouchdb/... on
>> > their client.
>> >
>> > Initially we set up a filtered replication for each user from servers
>> main
>> > database to the server copy of the users database.
>> > With this we ran into performance problems and sooner or later we
>> probably
>> > would have ran into issues with open file descriptors.
>> >
>> > So what we do instead is listening to the changes of the main database
>> and
>> > distribute the documents to the servers userdb, which then are synced
>> with
>> > the clients.
>> >
>> > Note: this is only for documents the users actually work with (as in
>> > possibly modify), for queries on the data we query views on the main
>> > database.
>> >
>> > For the way back, we listen to the _dbchanges, so we get an event for
>> > changes on the users dbs, get that change from the users db and determine
>> > what to do with it.
>> > We do not replicate back users changes to the main database but rather
>> have
>> > an internal API to evaluate all kinds of constrains on users input.
>> > If you do not have to check users input, you could certainly listen to
>> > _dbchanges and "blindly" one-shot replicate from the changed DB to your
>> > main DB.
>> >
>> > --
>> > Stefan
>> >
>>
>>
>>
>> --
>> Regards,
>> Varun
>>

Mime
View raw message