Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8313B19976 for ; Sat, 19 Mar 2016 18:36:23 +0000 (UTC) Received: (qmail 49311 invoked by uid 500); 19 Mar 2016 18:36:23 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 49245 invoked by uid 500); 19 Mar 2016 18:36:23 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 49190 invoked by uid 99); 19 Mar 2016 18:36:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Mar 2016 18:36:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id AA1DDC0529 for ; Sat, 19 Mar 2016 18:36:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.72 X-Spam-Level: X-Spam-Status: No, score=-0.72 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=messagingengine.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id suFee2mIn3nZ for ; Sat, 19 Mar 2016 18:36:22 +0000 (UTC) Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id ADDD75F1D5 for ; Sat, 19 Mar 2016 18:36:21 +0000 (UTC) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 39F3F200DF for ; Sat, 19 Mar 2016 14:36:15 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute2.internal (MEProxy); Sat, 19 Mar 2016 14:36:15 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=yhpwsapHDpg2BiY dJsxOWZ89ZDA=; b=E5V7qsoHIUuU3QlvxUlhJgtH+yir+B65G2EFrL+DTf2SbB8 FxgnI1dCULq/ie4EnM8Qb53kiLKPnrbYJ2Vnc+VEOEskqZjTFdi1ks/taMvpy59u YD4nmY3uVfUn9s91VGiXkeVNjF8Yk4XK7xrftx0jElMd3+jxJGOs6efnqzac= X-Sasl-enc: SZVYdVd1SC7j0bLa6ajXrOs3utzAsBNDFGw8fxz4S/no 1458412574 Received: from [10.30.1.6] (unknown [104.238.169.118]) by mail.messagingengine.com (Postfix) with ESMTPA id B560DC00012 for ; Sat, 19 Mar 2016 14:36:14 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: Multiple database backup strategy From: Robert Samuel Newson In-Reply-To: Date: Sat, 19 Mar 2016 18:36:13 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <2B92BD0A-427D-4C0D-A842-2368C35DC5D2@apache.org> References: <139A3579-EDE8-4DE3-B432-860C13B553EA@apache.org> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.3112) Hi, The problem is that _db_updates is not guaranteed to see every update, = so I think it falls at the first hurdle. What couch_replicator_manager does in couchdb 2.0 (though not in the = version that Cloudant originally contributed) is to us ecouch_event, = notice which are to _replicator shards, and trigger management work from = that. Some work I'm embarking on, with a few other devs here at Cloudant, is = to enhance the replicator manager to not run all jobs at once and it is = indeed the plan to have each of those jobs run for a while, kill them = (they checkpoint then close all resources) and reschedule them later. = It's TBD whether we'd always strip feed=3Dcontinuous from those. We = _could_ let each job run to completion (i.e, caught up to the source db = as of the start of the replication job) but I think we have to be a bit = smarter and allow replication jobs that constantly have work to do (i.e, = the source db is always busy), to run as they run today, with = feed=3Dcontinuous, unless forcibly ousted by a scheduler due to some = configuration concurrency setting. I note for completeness that the work we're planning explicitly = includes "multi database" strategies, you'll hopefully be able to make a = single _replicator doc that represents your entire intention (e.g, = "replicate _all_ dbs from server1 to server2"). B. > On 14 Mar 2016, at 02:40, Adam Kocoloski wrote: >=20 >=20 >> On Mar 10, 2016, at 3:18 AM, Jan Lehnardt wrote: >>=20 >>>=20 >>> On 09 Mar 2016, at 21:29, Nick Wood wrote: >>>=20 >>> Hello, >>>=20 >>> I'm looking to back up a CouchDB server with multiple databases. = Currently >>> 1,400, but it fluctuates up and down throughout the day as new = databases >>> are added and old ones deleted. ~10% of the databases are written to = within >>> any 5 minute period of time. >>>=20 >>> Goals >>> - Maintain a continual off-site snapshot of all databases, = preferably no >>> older than a few seconds (or minutes) >>> - Be efficient with bandwidth (i.e. not copy the whole database file = for >>> every backup run) >>>=20 >>> My current solution watches the global _changes feed and fires up a >>> continuous replication to an off-site server whenever it sees a = change. If >>> it doesn't see a change from a database for 10 minutes, it kills = that >>> replication. This means I only have ~150 active replications running = on >>> average at any given time. >>=20 >> How about instead of using continuous replications and killing them, >> use non-continuous replications based on _db_updates? They end >> automatically and should use fewer resources then. >>=20 >> Best >> Jan >> -- >=20 > In my opinion this is actually a design we should adopt for = CouchDB=E2=80=99s own replication manager. Keeping all those _changes = listeners running is needlessly expensive now that we have _db_updates. >=20 > Adam