Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: Half-baked idea: incremental virtual databases
From: Jan Lehnardt <jan@apache.org>
In-Reply-To: <70278F4A-FD08-4818-89B7-EA1B0AF846F5@gmail.com>
Date: Mon, 4 Feb 2013 23:39:35 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <AD14D3A0-AA63-4334-976E-262CC811431E@apache.org>
References: <E396884A-6331-483A-94B5-AB4A93664864@calftrail.com>
 <DB51C5B5-85C9-401A-ACE5-88C28894F51D@gmail.com>
 <1EE3C606-CD9B-4B3E-BFDE-32795DEBB1DB@calftrail.com>
 <70278F4A-FD08-4818-89B7-EA1B0AF846F5@gmail.com>
To: dev@couchdb.apache.org


On Feb 4, 2013, at 23:14 , Nathan Vander Wilt <natevw@gmail.com> wrote:

> On Jan 29, 2013, at 5:53 PM, Nathan Vander Wilt wrote:
>> So I've heard from both hosting providers that it is fine, but also =
managed to take both of their shared services "down" with only about =
~100 users (200 continuous filtered replications). I'm only now at the =
point where I have tooling to build out arbitrary large tests on my =
local machine to see the stats for myself, but as I understand it the =
issue is that every replication needs at least one couchjs process to do =
its filtering for it.
>>=20
>> So rather than inactive users mostly just taking up disk space, =
they're instead costing a full-fledged process worth of memory and =
system resources, each, all the time. As I understand it, this isn't =
much better on BigCouch either since the data is scattered =B1 evenly on =
each machine, so while the *computation* is spread, each node in the =
cluster still needs k*numberOfUsers couchjs processes running. So it's =
"scalable" in the sense that traditional databases are scalable: =
vertically, by buying machines with more and more memory.
>>=20
>> Again, I am still working on getting a better feel for the costs =
involved, but the basic theory with a master-to-N hub is not a great =
start: every change needs to be processed by every N replications. So if =
a user writes a document that ends up in the master database, every =
other user's filter function needs to process that change coming out of =
master. Even when N users are generating 0 (instead of M) changes, it's =
not doing M*N work but there's still always 2*N open connections and =
supporting processes providing a nasty baseline for large values of N.
>=20
> Looks like I was wrong about needing enough RAM for one couchjs =
process per replication.
>=20
> CouchDB maintains a pool of (no more than =
query_server_config/os_process_limit) couchjs processes and work is =
divvied out amongst these as necessary. I found a little meta-discussion =
of this system at https://issues.apache.org/jira/browse/COUCHDB-1375 and =
the code uses it here =
https://github.com/apache/couchdb/blob/master/src/couchdb/couch_query_serv=
ers.erl#L299
>=20
> On my laptop, I was able to spin up 250 users without issue. Beyond =
that, I start running into =B1 hardcoded system resource limits that =
Erlang has under Mac OS X but from what I've seen the only theoretical =
scalability issue with going beyond that on Linux/Windows would be =
response times, as the worker processes become more and more saturated.
>=20
> It still seems wise to implement tiered replications for communicating =
between thousands of *active* user databases, but that seems reasonable =
to me.

CouchDB=92s design is obviously lacking here.

For immediate relief, I=92ll propose the usual jackhammer of unpopular =
responses: write your filters in Erlang. (sorry :)

For the future: we already see progress in improving the view server =
situation. Once we get to a more desirable setup (yaynode/v8), we can =
improve the view server communication, there is no reason you=92d need a =
single JS OS process per active replication and we should absolutely fix =
that.

--

Another angle is the replicator. I know Jason Smith has a prototype of =
this in Node, it works. Instead of maintaining N active replications, we =
just keep a maximum number of active connections and cycle out ones that =
are currently inactive. The DbUpdateNotification mechanism should make =
this relatively straightforward. There is added overhead for setting up =
and tearing down replications, but we can make better use of resources =
and not clog things with inactive replications. Especially in a =
db-per-user scenario, most replications don=92t see much of an update =
most of the time, they should be inactive until data is written to any =
of the source databases. The mechanics in CouchDB are all there for =
this, we just need to write it.

--

Nate, thanks for sharing our findings and for bearing with us, despite =
your very understandable frustrations. It is people like you that allow =
us to make CouchDB better!

Best
Jan
--