couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <rnew...@apache.org>
Subject Re: Multiple database backup strategy
Date Sun, 20 Mar 2016 14:11:25 GMT
Hi,

If there's a chance that a user can add a single _replicator doc without it being picked up
by _db_updates, I think that's a deal breaker. If a user is regularly adding/updating/deleting
_replicator docs then, yes, I believe we can say we'll eventually notice.

I did mean 'use couch_event' not 'us ecouch_event', erroneous space bar activity owing to
jetlag.

We have the same goal here. Do you agree or disagree that a continuously _active_ replication
should simply stay running until descheduled? We obviously agree that an idle (or low activity)
replication should be shut down, freeing sockets, during its idle periods.

B.

> On 19 Mar 2016, at 20:31, Adam Kocoloski <kocolosk@apache.org> wrote:
> 
> Hi Bob, comments inline:
> 
>> On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <rnewson@apache.org> wrote:
>> 
>> Hi,
>> 
>> The problem is that _db_updates is not guaranteed to see every update, so I think
it falls at the first hurdle.
> 
> Do you mean to say that a listener of _db_updates is not guaranteed to see every updated
*database*? I think it would be helpful for the discussion to describe the scenario in which
an updated database permanently fails to show up in the feed. My recollection is that it’s
quite byzantine.
> 
>> What couch_replicator_manager does in couchdb 2.0 (though not in the version that
Cloudant originally contributed) is to us ecouch_event, notice which are to _replicator shards,
and trigger management work from that.
> 
> Did you mean to say “couch_event”? I assume so. You’re describing how the replicator
manager discovers new replication jobs, not how the jobs discover new updates to source databases
specified by replication jobs. Seems orthogonal to me unless I missed something.
> 
>> Some work I'm embarking on, with a few other devs here at Cloudant, is to enhance
the replicator manager to not run all jobs at once and it is indeed the plan to have each
of those jobs run for a while, kill them (they checkpoint then close all resources) and reschedule
them later. It's TBD whether we'd always strip feed=continuous from those. We _could_ let
each job run to completion (i.e, caught up to the source db as of the start of the replication
job) but I think we have to be a bit smarter and allow replication jobs that constantly have
work to do (i.e, the source db is always busy), to run as they run today, with feed=continuous,
unless forcibly ousted by a scheduler due to some configuration concurrency setting.
> 
> So I think this is really the crux of the issue. My contention is that permanently occupying
a socket for each continuous replication with the same source and mediator is needlessly expensive,
and that _db_updates could be an elegant replacement.
> 
>> I note  for completeness that the work we're planning explicitly includes "multi
database" strategies, you'll hopefully be able to make a single _replicator doc that represents
your entire intention (e.g, "replicate _all_ dbs from server1 to server2”).
> 
> Nice! It’ll be good to hear more about that design as it evolves, particularly in aspects
like discovery of newly created source databases and reporting of 403s and other fatal errors.
> 
> Adam


Mime
View raw message