couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Narciso García Revington <p.reving...@gmail.com>
Subject Re: The state of filtered replication
Date Thu, 26 May 2016 07:55:26 GMT
@Stefan

It can be as simple as putting the replicas behind a load balancer and
pointing all your clients to it. But it depends on how your application
works. This will also help you to scale this solution since you can add
more replicas to the load balancer without telling to the nodes. Things can
be more complicated if you want to write too.

2016-05-25 12:39 GMT+02:00 Stefan du Fresne <stefan@medicmobile.org>:

> Hi Pedro,
>
> Thanks for your advice.
>
> This is definitely something that is in the back of our minds, along with
> looking into couchdb clustering. Another similar option we’re considering
> is having filtered replication between those replicas and having them
> represent regions (our data permission structure is basically report <-
> person <- family <- region <- larger region <- still larger region). This
> would still involve filtered replication, but would cut down on irrelevant
> documents that users had to filter through. We’re still at the stage of
> trying to get the most out of one server however.
>
> On your example though, to be clear, assigning users to replicas is
> something that I have to manage myself, correct? Do you know if a
> particular user needs to stays on the same replica or if I could just
> dumbly direct them to any existing node? Naively I’d think that I could do
> the latter, but I’ve noticed one-way replication seems to involve passing
> some metadata back to the server (Pouch does this, though I’ve never really
> looked into what it’s sending or what Couch does it with.), so it’s not
> clear how stateful this kind of thing is.
>
> Cheers,
> Stefan
>
> > On 25 May 2016, at 09:51, Pedro Narciso García Revington <
> p.revington@gmail.com> wrote:
> >
> > Because couchdb supports master master replication you can alter your
> > schema to:
> >
> > master couchdb → couchdb replica 1 → some clients
> >                               couchdb replica 2 → some other clients
> >
> > So you can distrubute the load between the replicas.
> >
> > 2016-05-25 10:34 GMT+02:00 Stefan du Fresne <stefan@medicmobile.org
> <mailto:stefan@medicmobile.org>>:
> >
> >> Hello all,
> >>
> >> I work on an app that involves a large amount of CouchDB filtered
> >> replication (every user has a filtered subset of the DB locally via
> >> PouchDB). Currently filtered replication is our number 1 performance
> >> bottleneck for rolling out to more users, and I'm trying to work out
> where
> >> we can go from here.
> >>
> >> Our current setup is one CouchDB database and N PouchDB installations,
> >> which all two-way replicate, with the CouchDB->PouchDB replication being
> >> filtered based on user permissions / relevance [1].
> >>
> >> Our issue is that as we add users a) total document creation velocity
> >> increases, and b) the proportion of documents that are relevant to any
> >> particular user decreases. These two points cause replication-- both
> >> initial onboarding and continual-- to take longer and longer.
> >>
> >> At this stage we are being forced to manually limit the number of users
> we
> >> onboard at any particular time to half a dozen or so, or risk CouchDB
> being
> >> unresponsive [2]. As we'd want to be onboarding 50-100 at any particular
> >> time due to how we're rolling pit, you can imagine that this is pretty
> >> painful.
> >>
> >> I have already re-written the filter in Erlang, which halved its
> execution
> >> time, which is awesome!
> >>
> >> I also attempted to simplify the filter to increase performance.
> However,
> >> filter speed seems more dependent on the physical size of your filter as
> >> opposed to what code executes, which makes writing a simple filter that
> can
> >> fall-back to a complicated filter not terribly useful (see:
> >> https://issues.apache.org/jira/browse/COUCHDB-3021 <
> >> https://issues.apache.org/jira/browse/COUCHDB-3021 <
> https://issues.apache.org/jira/browse/COUCHDB-3021>>)
> >>
> >> If the above linked ticket is fixed (if it can be) this would make our
> >> filter 3-4x faster again. However, this still wouldn't address the
> >> fundamental issue that filtered replication is very CPU-intensive, and
> so
> >> as noted above doesn't seem to scale terribly well.
> >>
> >> Ideally then, I would like to remove filter replication completely, but
> >> there does not seem to be a good alternative right now.
> >>
> >> Looking through the archives there was talk of adding view replication,
> >> see:
> >>
> https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJNb-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E
> <
> https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJNb-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E
> >
> >> <
> >>
> https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJNb-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E
> <
> https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJNb-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E
> >>
> >> , but it doesn't look like this ever got resolved.
> >>
> >> There is also often talk of databases per user being a good scaling
> >> strategy, but we're basically doing that already (with PouchDB),  and
> for
> >> us documents aren't owned / viewed by just one person so this does not
> get
> >> us away from filtered replication (eg a supervisor replicates her
> documents
> >> as well as N sub-users documents). There are potentially wild and crazy
> >> schemes that involves many different databases where the equivalent of
> >> filtering is expressed in replication relationships, but this would add
> a
> >> massive amount of complexity to our app, and I’m not even convinced it
> >> would work as there are lots of edge cases to consider.
> >>
> >> Does anyone know of anything else I can try to increase replication
> >> performance? Or to safeguard against many replicators unacceptably
> >> degrading couchdb performance? Does Couch 2.0 address any of these
> concerns?
> >>
> >> Thanks in advance,
> >> - Stefan du Fresne
> >>
> >> [1] security is handled by not exposing couch and going through a
> wrapper
> >> service that validates couch requests, relevance is hierarchy based
> (i.e.
> >> documents you or your subordinates are authors of are replicated to you)
> >> [2] there are also administrators / configurers that access couchdb
> >> directly
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message