couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <rnew...@apache.org>
Subject Re: Multiple database backup strategy
Date Sun, 20 Mar 2016 14:22:26 GMT
My point is that we can (and currently do) trigger the replication manager on receipt of the
database updated event, so it avoids all of the other parts of the sequence you describe which
could fail.

The obvious difference, and I suspect this is what motivates Adam's position, is that _db_updates
can be called remotely. A solution using /_db_updates as its feed can run somewhere else,
it wouldn't even need to be a couchdb cluster. With the current 2.0 scheme, the _replicator
db has to live on the nodes performing replication management (and therefore it depends on
couch_{btree,file} etc). That's a huge incentive to go the /_db_updates route and it would
serve as a model for others like pouchdb that cannot choose to co-locate.

One side-benefit we get from using database updated events from the _replicator shards, though,
is that it helps us determine which node will run any particular job. We allocate a job to
the lowest live erlang node that hosts the document. If we go with /_db_updates, we'll need
some other scheme. That's not a bad thing (indeed, it could be a very good thing), but it
would need more thought. While in Seattle we did discuss both directions at some length and
believe we'd need some form of leader election system, the leader would then assign (and rebalance)
replication jobs across the erlang cluster. I pointed at a proof-of-concept implementation
of an algorithm I trust that I wrote a while back at https://github.com/cloudant/sobhuza as
a possible starting point.

B.

P.S. I'm using Mail.app and simply replying where it sticks the cursor (at the top), but in
other forums I've been berated for top-posting. Should I modify my reply style here?

On 19 Mar 2016, at 21:42, Benjamin Bastian <bbastian@apache.org> wrote:
> 
> When a shard is updated, it'll trigger a "database updated" event. CouchDB
> will hold those updates in memory for a configurable amount of time in
> order to dedupe updates. It'll then cast lists of updated databases to
> nodes which host the relevant _db_updates shards for further deduplication.
> It's only at that point that the updates are persisted. Only a single
> update needs to reach the _db_updates DB. IIRC, _db_updates triggers up to
> n^3 (assuming the _db_updates DB and the updated DB have the same N), so it
> may be a bit tricky for all of them to fail. You'd need coordinated node
> failure. Perhaps something like datacenter power loss. Another possible
> issue is if all the nodes which host a shard range of the _db_updates DB
> are unreachable by the nodes which host a shard range of any other DB. Even
> if it was momentary, it'd cause messages to be dropped from the _db_updates
> feed.
> 
> For n=3 DBs, it seems like it'd be difficult for all of those things to go
> wrong (except perhaps in the case of power loss or catastrophic network
> failure). For n=1 DBs, you'd simply need to reboot a node soon after an
> update.
> 
> On Sat, Mar 19, 2016 at 1:31 PM, Adam Kocoloski <kocolosk@apache.org> wrote:
> 
>> Hi Bob, comments inline:
>> 
>>> On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <rnewson@apache.org>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> The problem is that _db_updates is not guaranteed to see every update,
>> so I think it falls at the first hurdle.
>> 
>> Do you mean to say that a listener of _db_updates is not guaranteed to see
>> every updated *database*? I think it would be helpful for the discussion to
>> describe the scenario in which an updated database permanently fails to
>> show up in the feed. My recollection is that it’s quite byzantine.
>> 
>>> What couch_replicator_manager does in couchdb 2.0 (though not in the
>> version that Cloudant originally contributed) is to us ecouch_event, notice
>> which are to _replicator shards, and trigger management work from that.
>> 
>> Did you mean to say “couch_event”? I assume so. You’re describing how the
>> replicator manager discovers new replication jobs, not how the jobs
>> discover new updates to source databases specified by replication jobs.
>> Seems orthogonal to me unless I missed something.
>> 
>>> Some work I'm embarking on, with a few other devs here at Cloudant, is
>> to enhance the replicator manager to not run all jobs at once and it is
>> indeed the plan to have each of those jobs run for a while, kill them (they
>> checkpoint then close all resources) and reschedule them later. It's TBD
>> whether we'd always strip feed=continuous from those. We _could_ let each
>> job run to completion (i.e, caught up to the source db as of the start of
>> the replication job) but I think we have to be a bit smarter and allow
>> replication jobs that constantly have work to do (i.e, the source db is
>> always busy), to run as they run today, with feed=continuous, unless
>> forcibly ousted by a scheduler due to some configuration concurrency
>> setting.
>> 
>> So I think this is really the crux of the issue. My contention is that
>> permanently occupying a socket for each continuous replication with the
>> same source and mediator is needlessly expensive, and that _db_updates
>> could be an elegant replacement.
>> 
>>> I note  for completeness that the work we're planning explicitly
>> includes "multi database" strategies, you'll hopefully be able to make a
>> single _replicator doc that represents your entire intention (e.g,
>> "replicate _all_ dbs from server1 to server2”).
>> 
>> Nice! It’ll be good to hear more about that design as it evolves,
>> particularly in aspects like discovery of newly created source databases
>> and reporting of 403s and other fatal errors.
>> 
>> Adam


Mime
View raw message