couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Multiple database backup strategy
Date Sat, 19 Mar 2016 20:31:37 GMT
Hi Bob, comments inline:

> On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <rnewson@apache.org> wrote:
> 
> Hi,
> 
> The problem is that _db_updates is not guaranteed to see every update, so I think it
falls at the first hurdle.

Do you mean to say that a listener of _db_updates is not guaranteed to see every updated *database*?
I think it would be helpful for the discussion to describe the scenario in which an updated
database permanently fails to show up in the feed. My recollection is that it’s quite byzantine.

> What couch_replicator_manager does in couchdb 2.0 (though not in the version that Cloudant
originally contributed) is to us ecouch_event, notice which are to _replicator shards, and
trigger management work from that.

Did you mean to say “couch_event”? I assume so. You’re describing how the replicator
manager discovers new replication jobs, not how the jobs discover new updates to source databases
specified by replication jobs. Seems orthogonal to me unless I missed something.

> Some work I'm embarking on, with a few other devs here at Cloudant, is to enhance the
replicator manager to not run all jobs at once and it is indeed the plan to have each of those
jobs run for a while, kill them (they checkpoint then close all resources) and reschedule
them later. It's TBD whether we'd always strip feed=continuous from those. We _could_ let
each job run to completion (i.e, caught up to the source db as of the start of the replication
job) but I think we have to be a bit smarter and allow replication jobs that constantly have
work to do (i.e, the source db is always busy), to run as they run today, with feed=continuous,
unless forcibly ousted by a scheduler due to some configuration concurrency setting.

So I think this is really the crux of the issue. My contention is that permanently occupying
a socket for each continuous replication with the same source and mediator is needlessly expensive,
and that _db_updates could be an elegant replacement.

> I note  for completeness that the work we're planning explicitly includes "multi database"
strategies, you'll hopefully be able to make a single _replicator doc that represents your
entire intention (e.g, "replicate _all_ dbs from server1 to server2”).

Nice! It’ll be good to hear more about that design as it evolves, particularly in aspects
like discovery of newly created source databases and reporting of 403s and other fatal errors.

Adam
Mime
View raw message