couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <rnew...@apache.org>
Subject Re: Multiple database backup strategy
Date Sat, 19 Mar 2016 18:36:13 GMT
Hi,

The problem is that _db_updates is not guaranteed to see every update, so I think it falls
at the first hurdle.

What couch_replicator_manager does in couchdb 2.0 (though not in the version that Cloudant
originally contributed) is to us ecouch_event, notice which are to _replicator shards, and
trigger management work from that.

Some work I'm embarking on, with a few other devs here at Cloudant, is to enhance the replicator
manager to not run all jobs at once and it is indeed the plan to have each of those jobs run
for a while, kill them (they checkpoint then close all resources) and reschedule them later.
It's TBD whether we'd always strip feed=continuous from those. We _could_ let each job run
to completion (i.e, caught up to the source db as of the start of the replication job) but
I think we have to be a bit smarter and allow replication jobs that constantly have work to
do (i.e, the source db is always busy), to run as they run today, with feed=continuous, unless
forcibly ousted by a scheduler due to some configuration concurrency setting.

I note  for completeness that the work we're planning explicitly includes "multi database"
strategies, you'll hopefully be able to make a single _replicator doc that represents your
entire intention (e.g, "replicate _all_ dbs from server1 to server2").

B.


> On 14 Mar 2016, at 02:40, Adam Kocoloski <kocolosk@apache.org> wrote:
> 
> 
>> On Mar 10, 2016, at 3:18 AM, Jan Lehnardt <jan@apache.org> wrote:
>> 
>>> 
>>> On 09 Mar 2016, at 21:29, Nick Wood <nwood888@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> I'm looking to back up a CouchDB server with multiple databases. Currently
>>> 1,400, but it fluctuates up and down throughout the day as new databases
>>> are added and old ones deleted. ~10% of the databases are written to within
>>> any 5 minute period of time.
>>> 
>>> Goals
>>> - Maintain a continual off-site snapshot of all databases, preferably no
>>> older than a few seconds (or minutes)
>>> - Be efficient with bandwidth (i.e. not copy the whole database file for
>>> every backup run)
>>> 
>>> My current solution watches the global _changes feed and fires up a
>>> continuous replication to an off-site server whenever it sees a change. If
>>> it doesn't see a change from a database for 10 minutes, it kills that
>>> replication. This means I only have ~150 active replications running on
>>> average at any given time.
>> 
>> How about instead of using continuous replications and killing them,
>> use non-continuous replications based on _db_updates? They end
>> automatically and should use fewer resources then.
>> 
>> Best
>> Jan
>> --
> 
> In my opinion this is actually a design we should adopt for CouchDB’s own replication
manager. Keeping all those _changes listeners running is needlessly expensive now that we
have _db_updates.
> 
> Adam


Mime
View raw message