Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9506DEF63 for ; Tue, 29 Jan 2013 18:45:25 +0000 (UTC) Received: (qmail 79333 invoked by uid 500); 29 Jan 2013 18:45:25 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 79249 invoked by uid 500); 29 Jan 2013 18:45:24 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 79219 invoked by uid 99); 29 Jan 2013 18:45:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jan 2013 18:45:24 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.42] (HELO mail-da0-f42.google.com) (209.85.210.42) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jan 2013 18:45:15 +0000 Received: by mail-da0-f42.google.com with SMTP id z17so356593dal.1 for ; Tue, 29 Jan 2013 10:44:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:content-type:content-transfer-encoding:subject:date :message-id:to:mime-version:x-mailer:x-gm-message-state; bh=2kmbmrOU7fwQ9kuGBttmmuDllErxNxsL94AfOCNDdMc=; b=NdluW0qt7NkJdZx+lo12dk+qXDCRaHs7T/+Kow6kDBnd3WJIc2sMJqrnDoU397BV9k H57uCi6flSmjSpE+yGASscK64VQB355RSxbcMCqUtPJIOSOYaGfpdLs5x+YdCrOMeNpP uoFWIDHIHuxVW9t30EMcGDK55F5+JzkuZThjpur9SduxRujmOUWFK+eamu5szGVneRcn 6bQMDDANlmkrcSD0lD2ZdbQmcIfRM6RMz1TPvh/OO64uqFE855een9ZE6EbIjsCtOAOq IlC/yIEZBwBfrZersqtk73v/NwruJ43NqjMbH5AmOnytGChnAizvVvjraCFoasRUyrdP QZqA== X-Received: by 10.68.135.131 with SMTP id ps3mr5008314pbb.44.1359485093582; Tue, 29 Jan 2013 10:44:53 -0800 (PST) Received: from [192.168.13.21] (71-84-176-101.dhcp.mdfd.or.charter.com. [71.84.176.101]) by mx.google.com with ESMTPS id oj1sm8887076pbb.19.2013.01.29.10.44.52 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 29 Jan 2013 10:44:52 -0800 (PST) From: Nathan Vander Wilt Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Half-baked idea: incremental virtual databases Date: Tue, 29 Jan 2013 10:44:52 -0800 Message-Id: To: dev@couchdb.apache.org Mime-Version: 1.0 (Apple Message framework v1283) X-Mailer: Apple Mail (2.1283) X-Gm-Message-State: ALoCoQkQwJy4Q9bJh3yYwU74XIS9c+9AKwbRRFnAMsz9fxvTbiWHVyOZ2vXbWbNGCP42W0013vR6 X-Virus-Checked: Checked by ClamAV on apache.org # The problem It's a fairly common "complaint" that CouchDB's database model does not = support fine-grained control over reads. The canonical solution is a = database per user: = http://wiki.apache.org/couchdb/PerDocumentAuthorization#Database_per_user http://stackoverflow.com/a/4731514/179583 This does not scale. 1. It complicates formerly simple backup/redundancy: now I need to make = sure N replications stay working, N databases have correct permissions, = instead of just one "main" database. Okay, write some scripts, deploy = some cronjobs, can be made to work... 2. ...however, if data needs to be shared between users, this model = *completely falls apart*. Bi-directional continuous filtered replication = between a "hub" and each user database is extremely resource intensive. I na=EFvely followed the Best Practices and ended up with a system that = can barely support 100 users to a machine due to replication overhead. = Now if I want to continue doing it "The Right Way" I need to cobble = together some sort of rolling replication hack at best. It's apparent the real answer for CouchDB security, right now, is to = hide the database underneath some middleware boilerplate crap running as = DB root. This is a well-explored pattern, by which I mean the database = ends up with as many entry points as a sewer system has grates. # An improvement? What if CouchDB let you define virtual databases, that shared the = underlying document data when possible, that updated incrementally (when = queried) rather than continuously, that could even internally be = implemented in a fanout fashion? - virtual databases would basically be part of the internal b-tree key = hierarchy, sort of like multiple root nodes sharing the branches as much = as possible - sharing the underlying document data would almost halve the amount of = disk needed versus a "master" database storing all the data which is = then copied to each user - updating incrementally would put less continuous memory pressure on = the system - haven't actually done the maths, so I may be missing something, but = wouldn't fanning out changes internally from a master database through = intermediate partitions reduce the processing load? Basically, rather than each time a user updates a document, copying it = to a master database, then filtering every M updates through N instances = of couchjs; instead internally CouchDB could build a tree of combined = filters =97 say, master database filters to log(N) hidden partitions at = the first level and accepted changes would trickle through only relevant = further layers. (In a way, this is kind of at odds with the incremental = nature =97 maybe it does make sense to pay an amortized cost on write = rather than on reads.) # The urgency Maybe this *particular* solution isn't really a solution, but we need = one: If replicating amongst per-user databases is the only correct way to = implement document-level read permissions, CouchDB **NEEDS** built-in = support for a scalable way of doing so. There are plenty of other feature requests I could troll the list with = regarding CouchApps. But this one is key; everything else I've been able = to work around behind a little reverse proxy here and in front of an = external process there. Without scalable read-level security, I see no = particular raison d'=EAtre for Apache CouchDB =97 if CouchDB can't = support direct HTTP access in production in general, then it's just = another centralized database. thanks, -natevw=