Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
MIME-Version: 1.0
In-Reply-To: <AD14D3A0-AA63-4334-976E-262CC811431E@apache.org>
References: <E396884A-6331-483A-94B5-AB4A93664864@calftrail.com>
	<DB51C5B5-85C9-401A-ACE5-88C28894F51D@gmail.com>
	<1EE3C606-CD9B-4B3E-BFDE-32795DEBB1DB@calftrail.com>
	<70278F4A-FD08-4818-89B7-EA1B0AF846F5@gmail.com>
	<AD14D3A0-AA63-4334-976E-262CC811431E@apache.org>
Date: Mon, 4 Feb 2013 22:50:43 +0000
Message-ID: 
 <CABvT1DGX1WkM3VnraToWATRVRW=Mqk_rj5yi7m1cSvUN-2nV1g@mail.gmail.com>
Subject: Re: Half-baked idea: incremental virtual databases
From: Robert Newson <rnewson@apache.org>
To: "dev@couchdb.apache.org" <dev@couchdb.apache.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I had a mind to teach the _replicator db this trick. Since we have a
record of everything we need to resume a replication there's no reason
for a one-to-one correspondence between a _replicator doc and a
replicator process. We can simply run N of them for a bit (say, a
batch of 1000 updates) and then switch to others. The internal
db_updated mechanism is a good way to notice when we might have
updates worth sending but it's only half the story. A round-robin over
all _replicator docs (other than one-shot ones, of course) seems a
really neat trick to me.

B.

On 4 February 2013 22:39, Jan Lehnardt <jan@apache.org> wrote:
>
> On Feb 4, 2013, at 23:14 , Nathan Vander Wilt <natevw@gmail.com> wrote:
>
>> On Jan 29, 2013, at 5:53 PM, Nathan Vander Wilt wrote:
>>> So I've heard from both hosting providers that it is fine, but also man=
aged to take both of their shared services "down" with only about ~100 user=
s (200 continuous filtered replications). I'm only now at the point where I=
 have tooling to build out arbitrary large tests on my local machine to see=
 the stats for myself, but as I understand it the issue is that every repli=
cation needs at least one couchjs process to do its filtering for it.
>>>
>>> So rather than inactive users mostly just taking up disk space, they're=
 instead costing a full-fledged process worth of memory and system resource=
s, each, all the time. As I understand it, this isn't much better on BigCou=
ch either since the data is scattered =B1 evenly on each machine, so while =
the *computation* is spread, each node in the cluster still needs k*numberO=
fUsers couchjs processes running. So it's "scalable" in the sense that trad=
itional databases are scalable: vertically, by buying machines with more an=
d more memory.
>>>
>>> Again, I am still working on getting a better feel for the costs involv=
ed, but the basic theory with a master-to-N hub is not a great start: every=
 change needs to be processed by every N replications. So if a user writes =
a document that ends up in the master database, every other user's filter f=
unction needs to process that change coming out of master. Even when N user=
s are generating 0 (instead of M) changes, it's not doing M*N work but ther=
e's still always 2*N open connections and supporting processes providing a =
nasty baseline for large values of N.
>>
>> Looks like I was wrong about needing enough RAM for one couchjs process =
per replication.
>>
>> CouchDB maintains a pool of (no more than query_server_config/os_process=
_limit) couchjs processes and work is divvied out amongst these as necessar=
y. I found a little meta-discussion of this system at https://issues.apache=
.org/jira/browse/COUCHDB-1375 and the code uses it here https://github.com/=
apache/couchdb/blob/master/src/couchdb/couch_query_servers.erl#L299
>>
>> On my laptop, I was able to spin up 250 users without issue. Beyond that=
, I start running into =B1 hardcoded system resource limits that Erlang has=
 under Mac OS X but from what I've seen the only theoretical scalability is=
sue with going beyond that on Linux/Windows would be response times, as the=
 worker processes become more and more saturated.
>>
>> It still seems wise to implement tiered replications for communicating b=
etween thousands of *active* user databases, but that seems reasonable to m=
e.
>
> CouchDB=92s design is obviously lacking here.
>
> For immediate relief, I=92ll propose the usual jackhammer of unpopular re=
sponses: write your filters in Erlang. (sorry :)
>
> For the future: we already see progress in improving the view server situ=
ation. Once we get to a more desirable setup (yaynode/v8), we can improve t=
he view server communication, there is no reason you=92d need a single JS O=
S process per active replication and we should absolutely fix that.
>
> --
>
> Another angle is the replicator. I know Jason Smith has a prototype of th=
is in Node, it works. Instead of maintaining N active replications, we just=
 keep a maximum number of active connections and cycle out ones that are cu=
rrently inactive. The DbUpdateNotification mechanism should make this relat=
ively straightforward. There is added overhead for setting up and tearing d=
own replications, but we can make better use of resources and not clog thin=
gs with inactive replications. Especially in a db-per-user scenario, most r=
eplications don=92t see much of an update most of the time, they should be =
inactive until data is written to any of the source databases. The mechanic=
s in CouchDB are all there for this, we just need to write it.
>
> --
>
> Nate, thanks for sharing our findings and for bearing with us, despite yo=
ur very understandable frustrations. It is people like you that allow us to=
 make CouchDB better!
>
> Best
> Jan
> --
>
>
>