couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Klein <st.fankl...@gmail.com>
Subject Re: Maximum number of databases?
Date Sun, 09 Feb 2020 20:04:59 GMT
Hi,

Am So., 9. Feb. 2020 um 17:02 Uhr schrieb Marcus <couchdb@wordit.com>:
>
> "Rule 5: Avoid the “database per user” anti-pattern like the plague
> If you’re building out a multi-user service on top of Cloudant, it is tempting to let
each user store their data in a separate database under the application account. That works
well, mostly, if the number of users is small."
>
> Source: https://www.ibm.com/cloud/blog/cloudant-best-and-worst-practices-part-1

I think the important part is:

"Now add the need to derive cross-user analytics. The way you do that
is to replicate all the user databases into a single analytics DB. All
good. Now, this app suddenly becomes successful, and the number of
users grow from 150 to 20,000. Now we have 20,000 replications just to
keep the analytics DB current. If we also want to run in an
active-active DR setup, we add another 20,000 replications and
basically the system will stop functioning."

> What are your personal experiences with large numbers of databases?

We do have a large number of databases, the per-user approach, self hosted.
BUT we do not have any continuous replications running to sync the
databases with an analytics database.

>From my understanding a database _not_ in use is just some files lying
around in the filesystem.
So I do not think it makes sense to talk about "maximum number of
databases" but to talk about "maximum number of _active_ databases"
and "maximum number of concurrent replications".

With the newish scheduling replicator¹ even a large number of
replications should not be much of an issue, since they are no longer
concurrent. Still the quote from "rule 4" applies:

"The replicator scheduler has a limited number of simultaneous
replication jobs it is prepared to run. That means that as the number
of databases grows, the replication latency is likely to increase if
you try to replicate everything contained in an account."

Please take this with a grain of salt, I haven't played around with
the scheduling replicator yet, since we have a working system where we
do one shot replications based on application knowledge so far less
than "number of active users" replications are even triggered.

[1]: http://docs.couchdb.org/en/master/replication/replicator.html

-- 
Stefan

Mime
View raw message