couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Bartell <snbart...@gmail.com>
Subject Re: Half-baked idea: incremental virtual databases
Date Wed, 30 Jan 2013 00:55:15 GMT
Nathan, I'm actually in the process of setting up a multi-tenant environment  the canonical
way, like you have.

I've seen the replication overhead get pretty intense, but I figure that scaling out to several
couches is the way to go once the overhead becomes unbearable.  Actually I was hoping BigCouch
would eventually be the answer.  

Why is this not the case for you? 

In one of those links you provided (JasonSmith@stackoverflow) said that db per user is the
only scalable way.  It would be nice if he or someone here could weight in on why/how thats
the only scalable way. Especially in light of Nathan claiming the exact opposite.

sb

On Jan 29, 2013, at 10:44 AM, Nathan Vander Wilt <nate-lists@calftrail.com> wrote:

> # The problem
> 
> It's a fairly common "complaint" that CouchDB's database model does not support fine-grained
control over reads. The canonical solution is a database per user:
> http://wiki.apache.org/couchdb/PerDocumentAuthorization#Database_per_user
> http://stackoverflow.com/a/4731514/179583
> 
> This does not scale.
> 
> 1. It complicates formerly simple backup/redundancy: now I need to make sure N replications
stay working, N databases have correct permissions, instead of just one "main" database. Okay,
write some scripts, deploy some cronjobs, can be made to work...
> 
> 2. ...however, if data needs to be shared between users, this model *completely falls
apart*. Bi-directional continuous filtered replication between a "hub" and each user database
is extremely resource intensive.
> 
> I naïvely followed the Best Practices and ended up with a system that can barely support
100 users to a machine due to replication overhead. Now if I want to continue doing it "The
Right Way" I need to cobble together some sort of rolling replication hack at best.
> 
> It's apparent the real answer for CouchDB security, right now, is to hide the database
underneath some middleware boilerplate crap running as DB root. This is a well-explored pattern,
by which I mean the database ends up with as many entry points as a sewer system has grates.
> 
> 
> # An improvement?
> 
> What if CouchDB let you define virtual databases, that shared the underlying document
data when possible, that updated incrementally (when queried) rather than continuously, that
could even internally be implemented in a fanout fashion?
> 
> - virtual databases would basically be part of the internal b-tree key hierarchy, sort
of like multiple root nodes sharing the branches as much as possible
> - sharing the underlying document data would almost halve the amount of disk needed versus
a "master" database storing all the data which is then copied to each user
> - updating incrementally would put less continuous memory pressure on the system
> - haven't actually done the maths, so I may be missing something, but wouldn't fanning
out changes internally from a master database through intermediate partitions reduce the processing
load?
> 
> Basically, rather than each time a user updates a document, copying it to a master database,
then filtering every M updates through N instances of couchjs; instead internally CouchDB
could build a tree of combined filters — say, master database filters to log(N) hidden partitions
at the first level and accepted changes would trickle through only relevant further layers.
(In a way, this is kind of at odds with the incremental nature — maybe it does make sense
to pay an amortized cost on write rather than on reads.)
> 
> 
> # The urgency
> 
> Maybe this *particular* solution isn't really a solution, but we need one:
> 
> If replicating amongst per-user databases is the only correct way to implement document-level
read permissions, CouchDB **NEEDS** built-in support for a scalable way of doing so.
> 
> There are plenty of other feature requests I could troll the list with regarding CouchApps.
But this one is key; everything else I've been able to work around behind a little reverse
proxy here and in front of an external process there. Without scalable read-level security,
I see no particular raison d'être for Apache CouchDB — if CouchDB can't support direct
HTTP access in production in general, then it's just another centralized database.
> 
> 
> thanks,
> -natevw


Mime
View raw message