couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Fair <mich...@daclubhouse.net>
Subject Re: [DISCUSS] Per-doc access control
Date Tue, 26 Feb 2019 18:36:13 GMT
One thing I've always been concerned about when it comes to "user based"
document access is how it interacts with cross domain replication, and
mobile replication (or "user sliced" replication).

I believe this requires some kind of concept to a "Decentralized ID", or
mapping user ids (roles) per database to system users (users) because the
idea of a "user" and who the set of users are across administrative domains
isn't likely to be the same set.

Picture the same database replicating with each other between two
adminstrative domains; like say your domain and my domain ran two separate
Couch instances and we shared a common database.  For example, a "product
catalog", with per user access controls turned on so your organization can
edit products and their descriptions and prices and mine could create
purchase orders, and we both can update "tickets/issues/returns".

I think its super important to figure out how this case ought to behave
when considering the user access design.  We will most likely have
different user sets in our Couch server installations.

I suggest considering that all document databases only have the concept of
role ids, and not user ids, internal to the database and that privileges be
granted to those roles.  It's a requirement to create both a database role
and a system user and link the two to get access to the database; the
default role is called "public" which by default has read/write access to
all documents and users are automatically linked/mapped to it.

It's also fine to name a database role the same as a system userid (e.g.
the 'mfair' role); but that the database not have a concept of a "user"
(authentication) only the "role" (authorization).  The system daemon
handles autheticating users and there's a mapping of system users to
database roles (like the 'mfair' system user could map to the 'mfair'
database role).  The roles (and access privileges) would replicate with the
database but the system users would not.

This isn't a completely baked thought yet, but I think it goes in the right
direction.

Each independent server in a separate administrative domain would have its
own set of users, each server then has to map those system users to
database roles.  This should only have to be done once.

Couch servers could also choose to replicate their user databases too.
Multiple Couch servers that replicate their user databases with each other
is what I'm calling an "administrative domain" or simply "domain" for short.

A user can also be mapped to (i.e. hold) multiple roles in the same
database simultaneously.

This model is very close to how MS SQL Server and other SQL databases
handles users.

..........
Other ideas include:
- Disclaim that replication with other domains simply breaks user access
controls.  It's too complicated so its not supported.

- Replication can only be done within the context of a role that has
read/write access to the database's documents (a new replication
parameter).  Authenticating this login for automated replication might be
tricky (remote user credentials stored in the replication database?)...

- putting all the users inside the database itself so that those ids
replicate in addition to the contents but then every server has to be
entrusted to authenticate every user in every database it shares with other
domains amd the set of users becomes the superset of all users across all
participating domains.  Bring able to reset a user's password in another
domain because you have access to manipulating the database's users seems
"wrong"...

- Using a decentralized p2p identity scheme like pki and using Couch itself
as a distributed public key store.  This has the advantage that docs can be
encrypted and decryption secrets protected by keypairs so remote databases
can't automatically read contents they shouldn't...  It's obviously more
complicated than simply trusting the remote administrators and human beings
are notoriously bad at safely keeping secrets (they either end up sharing
them or losing/forgetting them).

- Make an executive decision that CouchDB no longer has a primary use case
for multimaster replication across administrative domains.  This feature is
always what set Couch apart for me.  Replicating documents between
decentralized administrative domains instead of only being a centralized
document repository for a single organization.  I get that folks like IBM
and other large single organization installations really don't care about
replicating/sharing their data with third party organizations; and that
sharing a multimaster distributed database across administrative domains is
not as common as a single organization with their own large private
repository and set of users; but I really like Couch specifically for the
cross domain replication use case.  I think it's a medium term problem that
people are looking for solutions to, and it's non-trivial to solve for.
How can we share "records" securely 'between' many organizations instead of
each organization trying to keep their own separate data instance copies in
sync with each other?  I think Couch, and the Couch replication protocol,
is a leading contender in addressing that challenge.

I bring this up now because I think whatever approach is used to adress the
cross domain authorization issue will have a huge influence on the
feature's design (alongside other factors).

Thanks,
Mike
On Feb 26, 2019 3:39 AM, "Jan Lehnardt" <jan@apache.org> wrote:

> Heya Garren,
>
> thanks for having a look. From a code-organisation perspective, some of my
> edits can easily live in a separate app vs. src/couch, that mostly an code
> orga task which I’m happy to do. The epi suggestion surely helps with with
> the handler overrides.
>
> Some of the changes however have to be in core CouchDB, specifically the
> storing of _access information on the various doc records, in order to
> ensure efficient updates. That’s not something a fully external app can
> manage. Whether it’s an extra field on those records or rather an entry
> in the existing meta field is secondary, but this needs propagating into
> by-id (and maybe also by-seq).
>
> I’m not sure about your suggestion to listen to all access=true DBs’s
> _changes feeds to generate the required indexes. That sounds like building
> a new mini couch_mrview/couch_index rather than re-using that
> infrastructure
> with minimal edits.
>
> As for the FDB option, going through the code this far helped me understand
> all the building blocks required and I think adding this to FDB CouchDB
> would maybe take a week total (i.e. be significantly easier), so I’m not
> aiming to re-use much for that implementation other than the future test
> suite.
>
> That said, I’m very not married to my existing code, and I’d love to hear
> any and all ways to simplify things.
>
> Best
> Jan
> —
>
>
> > On 26. Feb 2019, at 11:18, Garren Smith <garren@apache.org> wrote:
> >
> > Hi Jan,
> >
> > I've been giving this some thought and I wonder if we should take a step
> > back and rethink how we do this. Instead of implementing this directly
> into
> > the CouchDB core code, it might be better to write this as an application
> > similar to Dreyfus - Cloudant's search[1]. Instead of writing this code
> > directly in the core CouchDB code rather we write this as another
> > application. I'm hoping then that you wouldn't have to make huge
> > modifications to the CouchDB codebase which should make this easier to
> do.
> > The application would override the _all_docs and _changes endpoints, and
> if
> > a user has enabled access=true for that database then you could then
> return
> > the _all_docs and _changes requests from your application. The epi http
> > work is pretty fancy I think we could do some cool things around that to
> > make this work well. The app would listen to the changes feeds of any
> > database that has access=true and then implement the required index's for
> > _all_docs and changes. I think we then would not have to create a custom
> > indexer as we could build the indexes when new changes arrive.
> >
> > I'm also hoping that another advantage of doing this as an app that
> listens
> > to the changes feed is that there should be minimal work to get this to
> > work when we switch to fdb.
> >
> > This is obviously just an idea I had and I thought I would share it, not
> in
> > an attempt to derail what you doing, but hopefully in an attempt to make
> > sure we find the easiest and most effective way to get this done.
> >
> > Cheers
> > Garren
> >
> >
> > [1] https://github.com/cloudant-labs/dreyfus
> >
> > On Sun, Feb 17, 2019 at 4:25 PM Jan Lehnardt <jan@apache.org> wrote:
> >
> >> Hi Everyone,
> >>
> >> I’m happy to share my work in progress attempt to implement the per-doc
> >> access control feature we discussed a good while ago:
> >>
> >>
> >> https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb
> 509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
> >> <
> >> https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb
> 509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
> >>>
> >>
> >> You can check out my branch here:
> >>
> >> https://github.com/apache/couchdb/compare/access?expand=1 <
> >> https://github.com/apache/couchdb/compare/access?expand=1>
> >>
> >> It is very much work in progress, but it is far enough along to warrant
> >> discussion.
> >>
> >> The main point of this branch is to show all the places that we would
> need
> >> to change to support the proposal.
> >>
> >> Things I’ve left for later:
> >>
> >> - currently only the first element in the _access array is used. Our
> >> and/or syntax can be added later.
> >> - building per-access views has not been implemented yet, couch_index
> >> would have to be taught about the new per-access-id index.
> >> - pretty HTTP error handling
> >> - tests except for a tiny shell script 😇
> >>
> >> Implementation notes:
> >>
> >> You create a database with the _access feature turned on like so:  PUT
> >> /db?access=true
> >>
> >> I started out with storing _access in the document body, as that would
> >> allow for a minimal change set, however, on doc updates, we try hard
> not to
> >> load the old doc body from the database, and forcing us to do so for
> EVERY
> >> doc update under _access seemed prohibitive, so I extended the #doc,
> >> #doc_info and #full_doc_info records with a new `access` attribute that
> is
> >> stored in both by-id and by-seq. I will need guidance on how extending
> >> these records impact multi-version cluster interop. And especially
> whether
> >> this is an acceptable approach.
> >>
> >>
> >> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-
> 904ab7473ff8ddd07ea44aca414e3a36
> >>
> >> * * *
> >>
> >> The main addition is a new native query server called
> >> couch_access_native_proc, which implements two new indexes by-access-id
> and
> >> by-access-seq which do what you’d expect, pass in a userCtx and retrieve
> >> the equivalent of _all_docs or _changes, but only including those docs
> that
> >> match the username and roles in their _access property. The existing
> >> handlers for _all_docs and _changes have been augmented to use the new
> >> indexes instead of the default ones, unless the user is an admin.
> >>
> >>
> >> https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-
> fbb53323f07579be5e46ba63cb6701c4
> >>
> >> * * *
> >>
> >> The rest of the diff is concerned with making document CRUD behave as
> >> you’d expect it. See this little demonstration for what things look
> like:
> >>
> >> https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497 <
> >> https://gist.github.com/janl/b6d3f7502aa20b7b9ab9d9dcb8e92497> (I’m
> just
> >> noticing that there might be something wonky with DELETE, but you’ll get
> >> the gist #rimshot)
> >>
> >> * * *
> >>
> >> Open questions:
> >>
> >> - The aim of this is to get as close to regular CouchDB behaviour as
> >> possible. One thing that is new however which would require all apps to
> be
> >> changed is that for an _access enabled database to include an _access
> field
> >> in their docs (docs with no _access are admin-only for now). We might
> want
> >> to consider on new document writes to auto-insert the authenticated
> user’s
> >> name as the first element in the _access array, so existing apps “just
> >> work”.
> >>
> >> - Interplay with partitioned dbs: eschewing db-per-user is already a
> large
> >> boon if you have a lot of users, but making those per-user requests
> inside
> >> an _access enabled database efficient would be doubly nice, so why not
> use
> >> the username from the first question above and use that as the partition
> >> key? This would work nicely for natural users with their own docs that
> want
> >> to share them with others later, but I can easily imagine a pipelined
> use
> >> of CouchDB, where a “collector” user creates all new docs, an “analyser”
> >> takes them over and hand them to a “result” user for viewing. In that
> case,
> >> we’d violate the high-cardinality rule of partitions (have a lot of
> small
> >> ones), instead all docs go through all three users. I’d be okay with
> >> treating the later scenario as a minor use-case, but for that use-case,
> we
> >> should be able to disable auto-partitioning on db creation.
> >>
> >> - building access view indexes for docs that have frequent _access
> >> changes, lead to many orphaned view indexes, we should look at an
> >> auto-cleanup solution here (maybe keep 1-N indexes in case folks just
> swap
> >> back and forth).
> >>
> >> * * *
> >>
> >> I’ll leave this here for now, I’m sure there are a few more things to
> >> consider.
> >>
> >> I’d love to hear any and all feedback you might have. Especially if
> >> anything is unclear.
> >>
> >> Best
> >> Jan
> >> —
>
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message