couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Vander Wilt <nate-li...@calftrail.com>
Subject Half-baked idea: incremental virtual databases
Date Tue, 29 Jan 2013 18:44:52 GMT
# The problem

It's a fairly common "complaint" that CouchDB's database model does not support fine-grained
control over reads. The canonical solution is a database per user:
http://wiki.apache.org/couchdb/PerDocumentAuthorization#Database_per_user
http://stackoverflow.com/a/4731514/179583

This does not scale.

1. It complicates formerly simple backup/redundancy: now I need to make sure N replications
stay working, N databases have correct permissions, instead of just one "main" database. Okay,
write some scripts, deploy some cronjobs, can be made to work...

2. ...however, if data needs to be shared between users, this model *completely falls apart*.
Bi-directional continuous filtered replication between a "hub" and each user database is extremely
resource intensive.

I naïvely followed the Best Practices and ended up with a system that can barely support
100 users to a machine due to replication overhead. Now if I want to continue doing it "The
Right Way" I need to cobble together some sort of rolling replication hack at best.

It's apparent the real answer for CouchDB security, right now, is to hide the database underneath
some middleware boilerplate crap running as DB root. This is a well-explored pattern, by which
I mean the database ends up with as many entry points as a sewer system has grates.


# An improvement?

What if CouchDB let you define virtual databases, that shared the underlying document data
when possible, that updated incrementally (when queried) rather than continuously, that could
even internally be implemented in a fanout fashion?

- virtual databases would basically be part of the internal b-tree key hierarchy, sort of
like multiple root nodes sharing the branches as much as possible
- sharing the underlying document data would almost halve the amount of disk needed versus
a "master" database storing all the data which is then copied to each user
- updating incrementally would put less continuous memory pressure on the system
- haven't actually done the maths, so I may be missing something, but wouldn't fanning out
changes internally from a master database through intermediate partitions reduce the processing
load?

Basically, rather than each time a user updates a document, copying it to a master database,
then filtering every M updates through N instances of couchjs; instead internally CouchDB
could build a tree of combined filters — say, master database filters to log(N) hidden partitions
at the first level and accepted changes would trickle through only relevant further layers.
(In a way, this is kind of at odds with the incremental nature — maybe it does make sense
to pay an amortized cost on write rather than on reads.)


# The urgency

Maybe this *particular* solution isn't really a solution, but we need one:

If replicating amongst per-user databases is the only correct way to implement document-level
read permissions, CouchDB **NEEDS** built-in support for a scalable way of doing so.

There are plenty of other feature requests I could troll the list with regarding CouchApps.
But this one is key; everything else I've been able to work around behind a little reverse
proxy here and in front of an external process there. Without scalable read-level security,
I see no particular raison d'être for Apache CouchDB — if CouchDB can't support direct
HTTP access in production in general, then it's just another centralized database.


thanks,
-natevw
Mime
View raw message