Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C3BE46CE for ; Wed, 25 May 2011 19:23:58 +0000 (UTC) Received: (qmail 91182 invoked by uid 500); 25 May 2011 19:23:57 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 91104 invoked by uid 500); 25 May 2011 19:23:56 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 91096 invoked by uid 99); 25 May 2011 19:23:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 May 2011 19:23:56 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=FREEMAIL_FROM,FS_REPLICA,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chrisstocktonaz@gmail.com designates 209.85.210.52 as permitted sender) Received: from [209.85.210.52] (HELO mail-pz0-f52.google.com) (209.85.210.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 May 2011 19:23:50 +0000 Received: by pzk35 with SMTP id 35so5083169pzk.11 for ; Wed, 25 May 2011 12:23:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=QB7tdrTHD9ok7Q83oSTek2Q7eGq0XXGJLATUV7VHy98=; b=oCGH50bybGdEelCWIcMk/1w/8ffNzSZVH4gVoHNyR+8VYbe+G3QcrNYskFAmDnOr0g Zrveoo5YjuID9Pc2hlOcSesfKYMARUgmgm4duTyyPKRW4nEwEDdmuZzx0Z887d5EceFM 2cV94ctTzR2d0/M3uv2BBBIeCWqNbgZm2Wypg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=tWzvghxJ/0I9WGHg2xXzEwV/FsBWL1/qldLDZtyKUKzAOvrXWZoUy7a5XGb2xZmON2 ESKilkiQB36Wa+4CgS0TPsPEPJ0N63WOSDtkaank+qYUdpfqGvXxb7J+YdLl+SwKbLTP rwObG2JwGAC2siynp2SvePE10wsz7OsmLvOsM= MIME-Version: 1.0 Received: by 10.68.39.72 with SMTP id n8mr3357577pbk.93.1306351409914; Wed, 25 May 2011 12:23:29 -0700 (PDT) Received: by 10.68.63.67 with HTTP; Wed, 25 May 2011 12:23:29 -0700 (PDT) Date: Wed, 25 May 2011 12:23:29 -0700 Message-ID: Subject: Thoughts on server wide replication From: Chris Stockton To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 I was thinking if there was a server wide replication we could support many more users. Currently we are at a few thousand and we are starting to feel just the expense of all of the TCP connections and replication tasks, the calls to status to monitor that they are running etc are getting very expensive and noticeable. It would seem to me that a API for server wide replication would greatly benefit our use patterns, and I'm sure anyone else who scales through many databases (One database, is one customer). Here is a few ideas for such a feature, throwing this out here just to see if it sparks interest. We will call this API _replicate_server for example purposes, name could be subject to discussion. To begin server wide replication: curl -vX POST http://localhost:5984/_replicate_server -d '{"source":"example-database","target":"http://example.org/example-database"}' -> {"ok": true, <... other details>} To begin server wide replication with a filtering function, here maybe we can return either FALSE to not replicate, TRUE to replicate, then an array of filters to use a filtering function? this could be simple or very robust function(dbName, req) { return s.indexOf("my_interesting_dbs_prefix") == 1; } curl -vX POST http://localhost:5984/_replicate_server -d '{"source":"example-database","target":"http://example.org/example-database", "filter": "filters/server_filter"}' -> {"ok": true, <... other details>} To begin server wide replication for a array of dbs: curl -vX POST http://localhost:5984/_replicate_server -d '{"source":"example-database","target":"http://example.org/example-database", "database_names": ["db_1", "db_2" ..., "db_3050"]}' -> {"ok": true, <... other details>} Other params for request: "persistent": true|false - should this replication job persist through couchdb restart, maybe this adds a entry to the config file or something? "continuous": true|false - do a one time pass of all dbs or not, defaulting to true makes sense, but is inconsistent with _replicate, maybe just not support 1 time passes? my specific use cases don't require it but I don't want to just speak for myself. Just some thoughts from my last 1-2years or so experience with couchdb and my use patterns. If we could trim down and improve replication usability a bit I think couchdb could greatly benefit as a project. Right now having to tell replication to start, having to make sure it runs on restart (I know changes are coming/implemented for this of some sort), and monitoring your databases to make sure they are up to date is just a bit too much for the app tier to do and scares away DBA's from embracing the technology as much I think. Overall I love couchdb and find it to be a great product and has fit our needs very well. -Chris