couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Proposal: Reader Access Control Lists
Date Mon, 29 Aug 2011 23:48:37 GMT
On Sat, Aug 27, 2011 at 7:50 PM, Jason Smith <jhs@iriscouch.com> wrote:
> Hi, Chris. I totally agree with the requirements this idea serves.
> I've got some quick questions inline.
>
> On Sun, Aug 28, 2011 at 2:39 AM, Chris Anderson <jchris@apache.org> wrote:
>> So for instance if I have a private database setup for my message
>> browsing couchapp to run in, and there is a public database on a
>> server I trust, that runs reader_acls, then I can set up continuous
>> replication from there. Anyone in my organization who wants to
>> circulate a document among an adhoc group of people, would drop it in
>> the shared database, with the group of folks listed on the document.
>> Then it would be visible to them as they replicate, but not to anyone
>> else.
>
> What would a _changes query look like to me? Would it look like
> filtered replication, where I simply never see updates that aren't
> approved?

YES that's the idea. A database in reader_acl mode basically looks
like a database which is only good for one thing, filtered
replication, where the filter is computed based on the document._acl
field and the userCtx, using the same rules we have today for database
membership.

>
> But if I never see updates, how does the _changes responder know which
> to show and which to hide? Would it fetch each doc and read the _acl
> value?

Yes, it'd have the same diskio costs a filter. We could potentially
optimize dbs in this mode by moving the _acl storage to the by-seq
btree nodes instead of the doc bodies, but thinking about that now is
way to early.

>
> OTOH, if I can see update records, what would be the value of the
> "doc" field if I query include_docs=true?

right, docs you can't see, aren't in the changes feed. All you can see
is that the changes feed skips more #s than are accounted for by just
the updates you can see.

>
> Even if the "doc" value is null (to hide it from the user), leaking
> the _id and revs (the "changes" field) may have security implications.
>
> An _id might have an email address or other private information.

Right, ids should be hidden.

I'm not sure what to do about leakage of the existence of a document
due to update conflicts. I'm kinda thinking if someone is probing for
ids, they aren't using the probe to harvest for email addresses. Don't
store valuable secrets in your docids...

>
> A _rev (I'm reaching here, but stick with me) might also leak
> information. Consider an auction with secret bidding. Each lot is a
> document. Observing how _rev values change might inform you how
> frequently bids are made on the lot, hinting at which lots are selling
> well and which are going to be cheap. And the _id might tell you which
> lot it is.
>

_rev should definitely be secret

> What will deleted documents look like? Against all reason and
> propriety, people are storing data in "deleted" documents--in
> production!. (They say it is for auditing.) People also use HTTP
> DELETE, as well as updates like {"_id":"foo","_deleted":true}. When I
> replicate, it would be nice to receive this delete event; but I am not
> on the ACL anymore. Or can everybody see a deleted document? But then
> couldn't they see the extra data in there for auditors' eyes only
> (also the problematic _id and _rev trees)? Or do deleted docs obey
> ACLs just like any other document revision? That's a shame, because
> HTTP DELETE would implicitly strip all users from the ACL. Or does
> HTTP DELETE trigger ACL inheritance? Which revision does it inherit
> from?

DELETE should maintain the ACL. So after a delete the doc would have
the _acl and _deleted, if the DELETE verb was used. Probably if
someone does a PUT or a bulk_docs POST to delete we should respect
their wishes, regarding the contents of _acl. If they start to leak
deleted docids by stripping ACLs in their code, that is their bug.
There has to be some way of removing the _acl once it is applied.

>
>> Doing this is possible today but it involves a bunch of filtered
>> replication and app code to enforce that filters are applied.
>
> This is a very important statement. I take you to mean there are
> multiple solutions to this problem; and by implication ACLs are the
> best solution.

You can do this with behind-the-firewall filtered replication and
database per user. With reader access control lists you'll be able to
do it all from a single database syncpoint. (It's just that database
won't have views enabled, so anyone who wants to browse it will want
to browse a replica.)

>
>> Providing an optional shared or reader_acl mode for use at sync points
>> seems like a user friendly way to simplify something people already
>> want to do.
>>
>> A potential design:
>>
>> On the _security object setting reader_acl = true would enable the
>> reader access control lists, and make _views and _lists (and geocouch,
>> etc) into admin-only resources.
>
> Would another security model possibly come along later? If there are
> choices among mutually-exclusive models, maybe it should be
>
>    mode = "acl"
>
> For example, a blacklist-based security policy might be neat. Or maybe
> the "closed source couch app" where ddocs aren't visible but _show,
> _list, and _update are. You wouldn't want those all enabled at the
> same time would you? Or would you?!? :)
>

Yes I think the other things you describe would not be mutually
exclusive. reader_acl is just incompatible with views. So the views
would be admin only when reader_acl is true. So we are ok to make
setting _security.reader_acl = true turn off views. We just need to
document that clearly, and the sorry-no-views error message should be
clear as well.

> Maybe this is bikesheddy at this stage.
>
>> I'm imagining the way the reader ACLs would look on the documents is a
>> new top level field "_acl" that has a similar names/roles value
>> structure as the _security object:
>
> A note about CouchDB adoption and comprehensibility: usually ACLs
> (especially role-based systems like Couch) are not simply one list of
> readers; but a matrix with rights as columns and roles as rows; and
> you have yes/no values for roles/rights combinations.
>
> Perhaps a bit more structure, then:
>
>    { "_id": "someid"
>    , "_acl":
>      { "read": {"names": [...], "roles": [...]}
>      , "write": {"names": [...], "roles": [...]}
>      , "can_update_on_tuesdays": {"names"..., "roles"...}
>      }
>    }
>
> validate_doc_update could still implement the "write" and
> Tuesday-updates support, but I propose you give them a namespace to
> work with.

you can always stick extra junk in there. So what you describe would
really just be us making the API look like this:

  { "_id": "someid"
   , "_acl":
     { "read": {"names": [...], "roles": [...]}
     }
   }

Here is my original proposal:

  { "_id": "someid"
   , "_acl":
     {"names": [...], "roles": [...]}
   }

I see the use case you are pressing here. I do think having this names
and roles structure in more places will be good for tooling (should we
ever get tooling), but users can accomplish what you describe by
sticking thier updated_on_tuesdays stuff somewhere else in the
document. I guess I don't see the underscore parts of the _acl
internals ever caring about more than "read" ... For me the problem
with writer acls is that it will make it confusing because enabling
the reader-acls will turn off views, but writer-acls would really be
just a declarative validation function, so they can run whenever.
Maybe writer-acls are as important as reader acls, and can save
developers a lot of pain writing validation functions...

If we think we may want doc._acl.write someday, we are better off
leaving space for it now, with your schema instead of mine. But I am
not sure if we will want those. I want to hear from more people on
this. To me it smells complicated, so I wanna stay away from it for
now.

>
>>
>> {
>> _id : "someid",
>> foo : "bar",
>> _acl : {
>>   names : ["jchris@couchbase.com"]
>>   roles : ["aliens", "dogs"]
>> }
>>
>> So this a document that can be read by me, and also by any aliens or dogs.
>
> What is the response for docs with _acl undefined?

anyone can read

> What is the response for docs with _acl = {}?

anyone can read

> What is the response for docs with _acl = {"names":[], "roles":[]}?

anyone can read

>
> Are all three responses the same or do they differ? For the third
> version, is it like the _security behavior where that means everybody
> can read? Or is it *dissimilar* from _security but more like people's
> expectations where *nobody* can read (except admins)?

It should work like the stuff in the _security object.

I get that some people are confused by the empty members list being
public. I think I've heard some sane ideas for how to get us to a more
intuitive system (where maybe there are system roles like _anon and
_user or something that you can put on a db to make it different
degrees of public). Anyway that could totally derail this
conversation, and I think we can keep these discussions independent
and maybe make substantial progress.

Chris


-- 
Chris Anderson
http://jchrisa.net
http://couchbase.com

Mime
View raw message