couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Adams <mad...@phantomware.ca>
Subject Best practices for scaling (many small databases vs. a large one)
Date Tue, 07 Dec 2010 06:18:28 GMT
Hi folks,

I am writing with regards to best practices for scaling and the relative 
impacts of choosing to use many small databases vs. one (or more) very 
large databases.

Given the scenario with which I am working my original intent was to use 
many small databases.  In this situation users either need access to an 
entire database or not at all so the native CouchDB access permissions 
and/or a simple proxy would work quite well to secure data without the 
need for a more complicated authentication filter.  This also means that 
replication is an either/or thing (I would not need to worry about 
partial replication of databases).  There are other reasons why I lean 
towards many small databases but these are probably the primary ones 
(i.e., many smaller databases are simpler for me to implement for the 
purposes of getting CouchDB into play).

In this scenario most of the databases would be quite small (in the <1GB 
range) so we're not dealing with large data sets and the ratio of users 
to databases is also fairly low.

If users were to instead share one very large database (solely for the 
purpose of making things easier to cluster) they would usually only be 
accessing a very small portion of the database (e.g., a lot of the data 
would really belong to many small sets of users and not likely of 
interest to the user in question) and I would not want them to have any 
access to the remainder.

Problems arise in my mind when I start thinking about many thousands of 
these small databases.  What are the clustering implications?  Am I 
going to be busier dealing with the reality of replicating thousands of 
smaller databases for fail-over than simply biting the bullet now and 
planning for a somewhat more complex setup?   Are things like BigCouch 
really more suited to clustering (fewer) very large databases or do they 
thrive in environments where there are many small databases?


Hopefully this will be enough information for anyone who wishes to chime 
in and give me some thoughts or other things to consider.  I am not 
looking for specific solutions at this point but instead trying to weigh 
the pros and cons of moving in a particular direction.


Thanks very much,

Matt








Mime
View raw message