Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70DD9E4F7 for ; Mon, 4 Feb 2013 22:50:45 +0000 (UTC) Received: (qmail 50458 invoked by uid 500); 4 Feb 2013 22:50:45 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 50419 invoked by uid 500); 4 Feb 2013 22:50:44 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 50405 invoked by uid 99); 4 Feb 2013 22:50:44 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Feb 2013 22:50:44 +0000 Received: from localhost (HELO mail-ve0-f170.google.com) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Feb 2013 22:50:44 +0000 Received: by mail-ve0-f170.google.com with SMTP id 14so4674801vea.1 for ; Mon, 04 Feb 2013 14:50:43 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.58.28.169 with SMTP id c9mr20619523veh.5.1360018243407; Mon, 04 Feb 2013 14:50:43 -0800 (PST) Received: by 10.52.68.209 with HTTP; Mon, 4 Feb 2013 14:50:43 -0800 (PST) In-Reply-To: References: <1EE3C606-CD9B-4B3E-BFDE-32795DEBB1DB@calftrail.com> <70278F4A-FD08-4818-89B7-EA1B0AF846F5@gmail.com> Date: Mon, 4 Feb 2013 22:50:43 +0000 Message-ID: Subject: Re: Half-baked idea: incremental virtual databases From: Robert Newson To: "dev@couchdb.apache.org" Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I had a mind to teach the _replicator db this trick. Since we have a record of everything we need to resume a replication there's no reason for a one-to-one correspondence between a _replicator doc and a replicator process. We can simply run N of them for a bit (say, a batch of 1000 updates) and then switch to others. The internal db_updated mechanism is a good way to notice when we might have updates worth sending but it's only half the story. A round-robin over all _replicator docs (other than one-shot ones, of course) seems a really neat trick to me. B. On 4 February 2013 22:39, Jan Lehnardt wrote: > > On Feb 4, 2013, at 23:14 , Nathan Vander Wilt wrote: > >> On Jan 29, 2013, at 5:53 PM, Nathan Vander Wilt wrote: >>> So I've heard from both hosting providers that it is fine, but also man= aged to take both of their shared services "down" with only about ~100 user= s (200 continuous filtered replications). I'm only now at the point where I= have tooling to build out arbitrary large tests on my local machine to see= the stats for myself, but as I understand it the issue is that every repli= cation needs at least one couchjs process to do its filtering for it. >>> >>> So rather than inactive users mostly just taking up disk space, they're= instead costing a full-fledged process worth of memory and system resource= s, each, all the time. As I understand it, this isn't much better on BigCou= ch either since the data is scattered =B1 evenly on each machine, so while = the *computation* is spread, each node in the cluster still needs k*numberO= fUsers couchjs processes running. So it's "scalable" in the sense that trad= itional databases are scalable: vertically, by buying machines with more an= d more memory. >>> >>> Again, I am still working on getting a better feel for the costs involv= ed, but the basic theory with a master-to-N hub is not a great start: every= change needs to be processed by every N replications. So if a user writes = a document that ends up in the master database, every other user's filter f= unction needs to process that change coming out of master. Even when N user= s are generating 0 (instead of M) changes, it's not doing M*N work but ther= e's still always 2*N open connections and supporting processes providing a = nasty baseline for large values of N. >> >> Looks like I was wrong about needing enough RAM for one couchjs process = per replication. >> >> CouchDB maintains a pool of (no more than query_server_config/os_process= _limit) couchjs processes and work is divvied out amongst these as necessar= y. I found a little meta-discussion of this system at https://issues.apache= .org/jira/browse/COUCHDB-1375 and the code uses it here https://github.com/= apache/couchdb/blob/master/src/couchdb/couch_query_servers.erl#L299 >> >> On my laptop, I was able to spin up 250 users without issue. Beyond that= , I start running into =B1 hardcoded system resource limits that Erlang has= under Mac OS X but from what I've seen the only theoretical scalability is= sue with going beyond that on Linux/Windows would be response times, as the= worker processes become more and more saturated. >> >> It still seems wise to implement tiered replications for communicating b= etween thousands of *active* user databases, but that seems reasonable to m= e. > > CouchDB=92s design is obviously lacking here. > > For immediate relief, I=92ll propose the usual jackhammer of unpopular re= sponses: write your filters in Erlang. (sorry :) > > For the future: we already see progress in improving the view server situ= ation. Once we get to a more desirable setup (yaynode/v8), we can improve t= he view server communication, there is no reason you=92d need a single JS O= S process per active replication and we should absolutely fix that. > > -- > > Another angle is the replicator. I know Jason Smith has a prototype of th= is in Node, it works. Instead of maintaining N active replications, we just= keep a maximum number of active connections and cycle out ones that are cu= rrently inactive. The DbUpdateNotification mechanism should make this relat= ively straightforward. There is added overhead for setting up and tearing d= own replications, but we can make better use of resources and not clog thin= gs with inactive replications. Especially in a db-per-user scenario, most r= eplications don=92t see much of an update most of the time, they should be = inactive until data is written to any of the source databases. The mechanic= s in CouchDB are all there for this, we just need to write it. > > -- > > Nate, thanks for sharing our findings and for bearing with us, despite yo= ur very understandable frustrations. It is people like you that allow us to= make CouchDB better! > > Best > Jan > -- > > >