couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Partial replication -orelse- sending interpreted data to another server
Date Mon, 16 Feb 2009 18:18:11 GMT
On Mon, Feb 16, 2009 at 1:02 PM, Damien Katz <damien@apache.org> wrote:
> The problem with b) admin-enforced replication policies is that it's not
> really possible. The replicator is just an agent of the user who invoked it,
> it can choose to follow some rules set by the admin, or it follows it's own
> rules. You can't give a user access to the database, but enforce that they
> can only replicate it the admin specified way. If the user can perform a
> certain update in the database using regular methods, he can also do so via
> the replicator.
>

As we were mulling over the security considerations of allowing users
to run arbitrary code on a CouchDB node, I had the idea to allow the
node's admin to store a set of predefined methods that could be used
as replication endpoints. As in, instead of posting a JS function, we
post a {"filter": "foo/name"} member and it pulls the replication
filter code from {"replication_filters": {"name": "function...."}}
defined in "_design/foo". This way we have the full benefit of using
JS to do our filtering while preventing arbitrary code execution.

> Therefore the answer is to not distinguish between replicated updates and
> direct updates. Instead enforce same security rules either way. This user
> can update this document with these values, or he can't. Doesn't matter if
> it's replicated or direct.
>
> This, like much of CouchDB, is very much inspired by how Lotus Notes already
> works. Notes does partial replication, has signed design elements and
> scripts and cryptographically verifiable users. The Notes model treats the
> user who replicates the update as the person is performing the update. The
> Notes security model is more rigid and thoroughly integrated than what I
> plan. CouchDB instead will provide all the hooks necessary to build a Notes
> like security system, but will actually have more flexibility here, as we
> can customize the security model more.
>
> If you are worried about runaway scripts and scripts that use too many
> resources, then the only real option is to provide a non-turing equivalent
> query language, or limit the code to a subset of the language. But even
> though you'll know it terminates, it's hard to limit how expensive the
> operations are and how often it's invoked. There is never really a good
> option here, anything constrained enough to make time/space guarantees is
> often too limited to be useful. Timeouts suck, but so does everything else.
>

That's a much clearer explanation than my initial, "ewwww" reaction.

HTH,
Paul Davis

> -Damien
>
>
> On Feb 16, 2009, at 11:54 AM, Martin Scholl wrote:
>
>> Hello all,
>>
>>
>> at #couchdb we discussed how partial replication could be implemented.
>> We discussed pros and cons, with davisp requesting I should write an
>> email to dev@. Well, here it is...
>>
>> ===
>>
>> Basically, 2 approaches to replication were discussed (names freely
>> added, please substitute with more appropriate ones):
>>
>> a) "Push-pull scenario":
>> client wishing to get some documents replicated, sends to the
>> replicating server a design doc with a predicate in it. The predicate
>> determines which docs are to be replicated ("pull replication")
>>
>> b) "Pull-pull scenario":
>> - a DB admin adds a set of design docs which a client then triggers to
>> retrieve the the docs/the set of docids.
>>
>> While a) is way more versatile, variant b) leaves the admin with more
>> control over what happens with his/her database.
>>
>> My concern with a) is
>> - it breaks with the principle "payload is payload, and code is code",
>> - it opens the door to several dos attacks. Imagine a predicate doing
>> while(1) {}. Setting per doc timeout to a low value (as Jan suggested)
>> doesn't really solve the cpu hogging issue.
>>
>> So, this all boils down to the questions:
>> - what principles for selective replication should be employed?
>> - how can we establish a system of trust for foreign java scripts? (e.g.
>> code-signing and all that stuff)
>> - is the solution for all this "make the replication regime a db
>> configuration option"?
>>
>> Although it would break with several of CouchDB's traditions, a solution
>> that is secure and versatile could be a descriptive approach. Something
>> like this (simplified json):
>>
>> replicate: {
>>  input: {
>>    <Param1>: { type: int; default: 42; },
>>    <Param2>: { type: string; default: "wiki/"}
>>  }
>>
>>  filter: {
>>    <set of filters>
>>  }
>> };
>>
>> with a filter being recursively defined:
>>  filter: {
>>   type: <and,or,xor,not;
>>   filter: <recursive definition of a filter>
>>  }
>>
>> With the filter-family and,or,xor,not describing how the recursive
>> sub-filters should be composed, and a 2nd type of filter:
>>
>> filter: {
>>  type: match;
>>  filter: <Json struct>
>> }
>>
>> with <Json struct> being any json object which may embody constructs
>> '$<Param1>$' which are dynamically substituted with the script's input
>> variables.
>> With such a descriptive filter description we can match
>> a) several types of documents by using an or-filter together with
>> several sub-filters which are match-filters
>> b) more importantly: we can start reasoning on the filters and enforce
>> several security constraints (e.g. max. filter depth: 3, only or filters
>> allowed, only 2 match filters allowed, etc.).
>>
>> I would like to hear what you think about all the different approaches.
>>
>>
>> Martin
>>
>> P.S.: Again, sorry for not being able to provide code. The sole purpose
>> of this email is to document some thoughts and others' ideas.
>
>

Mime
View raw message