From dev-return-2651-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Mon Feb 16 16:55:05 2009 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 29730 invoked from network); 16 Feb 2009 16:55:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Feb 2009 16:55:05 -0000 Received: (qmail 91948 invoked by uid 500); 16 Feb 2009 16:55:04 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 91912 invoked by uid 500); 16 Feb 2009 16:55:04 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 91901 invoked by uid 99); 16 Feb 2009 16:55:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2009 08:55:04 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=FS_REPLICA,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [85.10.196.218] (HELO mail.diskware.net) (85.10.196.218) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2009 16:54:56 +0000 Received: from [192.168.1.114] (mail-mr.globalinfinity.de [80.152.148.242]) (Authenticated sender: ms@diskware.net) by mail.diskware.net (Postfix) with ESMTP id 2747220D4062 for ; Mon, 16 Feb 2009 17:54:30 +0100 (CET) Message-ID: <49999A45.9040302@diskware.net> Date: Mon, 16 Feb 2009 17:54:29 +0100 From: Martin Scholl User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: dev@couchdb.apache.org Subject: Partial replication -orelse- sending interpreted data to another server Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello all, at #couchdb we discussed how partial replication could be implemented. We discussed pros and cons, with davisp requesting I should write an email to dev@. Well, here it is... === Basically, 2 approaches to replication were discussed (names freely added, please substitute with more appropriate ones): a) "Push-pull scenario": client wishing to get some documents replicated, sends to the replicating server a design doc with a predicate in it. The predicate determines which docs are to be replicated ("pull replication") b) "Pull-pull scenario": - a DB admin adds a set of design docs which a client then triggers to retrieve the the docs/the set of docids. While a) is way more versatile, variant b) leaves the admin with more control over what happens with his/her database. My concern with a) is - it breaks with the principle "payload is payload, and code is code", - it opens the door to several dos attacks. Imagine a predicate doing while(1) {}. Setting per doc timeout to a low value (as Jan suggested) doesn't really solve the cpu hogging issue. So, this all boils down to the questions: - what principles for selective replication should be employed? - how can we establish a system of trust for foreign java scripts? (e.g. code-signing and all that stuff) - is the solution for all this "make the replication regime a db configuration option"? Although it would break with several of CouchDB's traditions, a solution that is secure and versatile could be a descriptive approach. Something like this (simplified json): replicate: { input: { : { type: int; default: 42; }, : { type: string; default: "wiki/"} } filter: { } }; with a filter being recursively defined: filter: { type: } With the filter-family and,or,xor,not describing how the recursive sub-filters should be composed, and a 2nd type of filter: filter: { type: match; filter: } with being any json object which may embody constructs '$$' which are dynamically substituted with the script's input variables. With such a descriptive filter description we can match a) several types of documents by using an or-filter together with several sub-filters which are match-filters b) more importantly: we can start reasoning on the filters and enforce several security constraints (e.g. max. filter depth: 3, only or filters allowed, only 2 match filters allowed, etc.). I would like to hear what you think about all the different approaches. Martin P.S.: Again, sorry for not being able to provide code. The sole purpose of this email is to document some thoughts and others' ideas.