couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: Partial replication -orelse- sending interpreted data to another server
Date Mon, 16 Feb 2009 18:02:55 GMT
The problem with b) admin-enforced replication policies is that it's  
not really possible. The replicator is just an agent of the user who  
invoked it, it can choose to follow some rules set by the admin, or it  
follows it's own rules. You can't give a user access to the database,  
but enforce that they can only replicate it the admin specified way.  
If the user can perform a certain update in the database using regular  
methods, he can also do so via the replicator.

Therefore the answer is to not distinguish between replicated updates  
and direct updates. Instead enforce same security rules either way.  
This user can update this document with these values, or he can't.  
Doesn't matter if it's replicated or direct.

This, like much of CouchDB, is very much inspired by how Lotus Notes  
already works. Notes does partial replication, has signed design  
elements and scripts and cryptographically verifiable users. The Notes  
model treats the user who replicates the update as the person is  
performing the update. The Notes security model is more rigid and  
thoroughly integrated than what I plan. CouchDB instead will provide  
all the hooks necessary to build a Notes like security system, but  
will actually have more flexibility here, as we can customize the  
security model more.

If you are worried about runaway scripts and scripts that use too many  
resources, then the only real option is to provide a non-turing  
equivalent query language, or limit the code to a subset of the  
language. But even though you'll know it terminates, it's hard to  
limit how expensive the operations are and how often it's invoked.  
There is never really a good option here, anything constrained enough  
to make time/space guarantees is often too limited to be useful.  
Timeouts suck, but so does everything else.

-Damien


On Feb 16, 2009, at 11:54 AM, Martin Scholl wrote:

> Hello all,
>
>
> at #couchdb we discussed how partial replication could be implemented.
> We discussed pros and cons, with davisp requesting I should write an
> email to dev@. Well, here it is...
>
> ===
>
> Basically, 2 approaches to replication were discussed (names freely
> added, please substitute with more appropriate ones):
>
> a) "Push-pull scenario":
> client wishing to get some documents replicated, sends to the
> replicating server a design doc with a predicate in it. The predicate
> determines which docs are to be replicated ("pull replication")
>
> b) "Pull-pull scenario":
> - a DB admin adds a set of design docs which a client then triggers to
> retrieve the the docs/the set of docids.
>
> While a) is way more versatile, variant b) leaves the admin with more
> control over what happens with his/her database.
>
> My concern with a) is
> - it breaks with the principle "payload is payload, and code is code",
> - it opens the door to several dos attacks. Imagine a predicate doing
> while(1) {}. Setting per doc timeout to a low value (as Jan suggested)
> doesn't really solve the cpu hogging issue.
>
> So, this all boils down to the questions:
> - what principles for selective replication should be employed?
> - how can we establish a system of trust for foreign java scripts?  
> (e.g.
> code-signing and all that stuff)
> - is the solution for all this "make the replication regime a db
> configuration option"?
>
> Although it would break with several of CouchDB's traditions, a  
> solution
> that is secure and versatile could be a descriptive approach.  
> Something
> like this (simplified json):
>
> replicate: {
>  input: {
>     <Param1>: { type: int; default: 42; },
>     <Param2>: { type: string; default: "wiki/"}
>  }
>
>  filter: {
>     <set of filters>
>  }
> };
>
> with a filter being recursively defined:
>  filter: {
>    type: <and,or,xor,not;
>    filter: <recursive definition of a filter>
>  }
>
> With the filter-family and,or,xor,not describing how the recursive
> sub-filters should be composed, and a 2nd type of filter:
>
> filter: {
>  type: match;
>  filter: <Json struct>
> }
>
> with <Json struct> being any json object which may embody constructs
> '$<Param1>$' which are dynamically substituted with the script's input
> variables.
> With such a descriptive filter description we can match
> a) several types of documents by using an or-filter together with
> several sub-filters which are match-filters
> b) more importantly: we can start reasoning on the filters and enforce
> several security constraints (e.g. max. filter depth: 3, only or  
> filters
> allowed, only 2 match filters allowed, etc.).
>
> I would like to hear what you think about all the different  
> approaches.
>
>
> Martin
>
> P.S.: Again, sorry for not being able to provide code. The sole  
> purpose
> of this email is to document some thoughts and others' ideas.


Mime
View raw message