manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: How to map the atlassian confluence security model to manifoldcf
Date Thu, 30 May 2013 11:28:42 GMT
"But my colleague is worried that this solution does not scale well on solr
and
that we will have to deal with very long user lists. (We implement this for
a
~300,000 people company)."

I think Solr will be fine here.  This is no worse than a document that has
300000 words, and Solr performs queries against such documents very well.
The only real concern is how long ManifoldCF will take to post such
documents to Solr, and whether you need to give it more memory. ;-)

At some point this obviously would break down, but I don't think we'll see
a company that big in my lifetime.
Karl


On Thu, May 30, 2013 at 7:19 AM, Markus Schuch <markus_schuch@web.de> wrote:

> Hi Karl,
>
> sorry for not beeing very responsive.
> We had a lot to do this week. We created a confluence plugin to add an api
> to
> confluence that can give us all the information about permissions we need.
>
> Your proposal to calculate the minimal user/group list (take groups where
> possible, create intersection of userlists where needed) sounds promising
> to me.
> But my colleague is worried that this solution does not scale well on solr
> and
> that we will have to deal with very long user lists. (We implement this
> for a
> ~300,000 people company). At the moment we don't know how long the biggest
> userlist will be.
>
> So, the next step is to examine the content and the permissions after our
> admins
> installed the brand new plugin. When we have an overview how our admins
> work
> with page permissions and how big our groups and the resulting intersected
> user
> lists are, than we will decide which way to go.
>
> I'll keep you updated.
>
> Markus
>
> Am 30.05.2013 12:17, schrieb Karl Wright:
> > Hi Markus,
> >
> > Have you had any luck with this?
> >
> > Karl
> >
> >
> >
> > On Sun, May 26, 2013 at 9:32 AM, Karl Wright <daddywri@gmail.com
> > <mailto:daddywri@gmail.com>> wrote:
> >
> >     Hi Markus,
> >
> >     The usual way these things map is that there is an API call that
> gets a list
> >     of groups and users that can see
> >     the resource, and *maybe* there's a list of groups and users that are
> >     prohibited from seeing the resource.
> >     These user ids and group ids get used as access tokens.  The
> semantics of
> >     the ManifoldCF access tokens are that prohibitions supercede
> allowances.
> >     The authority service then simply returns the user id and a list of
> >     group ids to which the user belongs, provided such functionality
> exists in
> >     the API.
> >
> >     In the case of Atlassian, where parents have both prohibition lists
> as well
> >     as allowance lists, it is usually the case that the prohibition
> lists can
> >     simply be unioned when they are flattened.  Being a member of any
> prohibited
> >     group in the hierarchy is sufficient to exclude a user from seeing
> the
> >     resource.  For allowance
> >     lists, however, it is not possible to merge the lists in a simple
> way, since
> >     as you point out you are trying to
> >     capture an "AND" relationship.  To make this concrete, say you have
> three
> >     objects - A->B->C, and let's say
> >     P(A) is the allow list for A, P(B) for B, etc.  Then, you want
> >     "user_in(P(A)) AND user_in(P(B)) AND user_in(P(C))".
> >
> >     I agree that the only viable way to flatten this is to create an
> access
> >     token for every combination of group
> >     permissions you are likely to see.  So if there were the groups G1
> G2 G3 G4
> >     and G5, there would have to be
> >     access tokens for "G1 AND G2", "G2 AND G3", "G1 AND G2 AND G3", etc.
>  The
> >     authority service would then be stuck returning a combinatorially
> large
> >     number of access tokens, and that would not do at all.
> >
> >     An alternative is to try and find a way to implement the AND
> relationship
> >     between access tokens natively.
> >     To do it his way requires an open-ended and potentially
> combinatorially
> >     large number of index fields.  You'd
> >     need one such field per page, seems to me.  In theory Solr has a way
> of
> >     creating N fields at index time, where
> >     you just use a special field prefix, and the field is created.  But
> there
> >     are two problems with this.  First,
> >     at query time, the Lucene query the Solr plugin would need to build
> would
> >     contain a clause for every page in
> >     Atlassian.  That's not going to work.  Second, we'd need a default
> value for
> >     access tokens for all pages in
> >     Atlassian for every document indexed, and I don't think that's
> configurable
> >     in Solr either.
> >
> >     Another alternative is to post-filter results.  This will require
> >     significant support in ManifoldCF, especially in the
> >     authority connector, but it could be added with not too much
> trouble.  The
> >     downside is that there are going to
> >     be cases where one would need to go through a lot of results to find
> the few
> >     that one is allowed to see.  I'm
> >     willing to do this, though, if there are no better alternatives.
> >
> >     But there's one more possibility, which is worth thinking about.
> >     Specifically, try the approach of actually calculating the minimal
> >     user/group list for the document, at indexing time.  So the access
> tokens
> >     are group id's and user id's, and the connector logic actually
> calculates
> >     the minimal intersection of P(A), P(B), and P(C) in the example
> above.
> >
> >     Example 1:
> >     P(A) was G1 or G2
> >     P(B) was G2 or G3
> >     P(C) was G4
> >
> >     ...then the logic would explicitly find all users which matched ALL
> of those
> >     criteria - which would mean that the
> >     access token list for the document would be a list of individual
> user id's
> >     in this case, not groups - specifically the list of user ids of
> those users
> >     that belong to G2 AND G4.
> >
> >     Example 2:
> >     P(A) was G1 or G2 or G3
> >     P(B) was G2 or G3
> >     P(C) was G3
> >
> >     ...then the logic would return just the group id for G3.
> >
> >     The only problem with this approach that I can see is that if the
> sysadmin
> >     structures things like example 1, the
> >     only way a user would be rendered unable to see such a document
> would be via
> >     reindexing.  Changing the user's group affinity alone would not be
> >     sufficient in that case.  However, I strongly suspect that real
> Atlassian
> >     sysadmins do things more like Example 2 than Example 1.  What do you
> think?
> >
> >     Karl
> >
> >
> >
> >     On Sat, May 25, 2013 at 8:20 PM, Markus Schuch <markus_schuch@web.de
> >     <mailto:markus_schuch@web.de>> wrote:
> >
> >         Hi Karl,
> >
> >         no need to apologize... a response in less than 24 hours to an
> open
> >         source project's mailing list entry is perfect to me ;) - so
> thank you
> >         for the quick response and thank you for sacrificing your
> valuable
> >         holiday weekend time.
> >
> >         The confluence API returns user and/or group names when
> requesting
> >         permissions for a page.
> >
> >         see:
> >
> https://developer.atlassian.com/display/CONFDEV/Remote+Confluence+Methods#RemoteConfluenceMethods-Permissions.1
> >
> https://developer.atlassian.com/display/CONFDEV/Remote+Confluence+Data+Objects#RemoteConfluenceDataObjects-contentpermissionContentPermission
> >
> >         But the API methods for retrieving page permissions do not
> respect
> >         permissions inherited from parent pages which is very sad.
> (refer to
> >         https://jira.atlassian.com/browse/CONF-14965)
> >
> >         To workaround this problem we will have to write a confluence
> plugin
> >         that can give us the effective permissions for a page.
> >         We looked into that and we think it is possible.
> >         In theory the effective page permissions retrieved by our plugin
> would
> >         be a list of group names and/or usernames. The groupnames have
> to be
> >         ANDed to respect permissions inherited from parent pages. We can
> >         concatenate all needed combinations of group and user names to
> single
> >         accesstokens to create a "flattened" version of the permission
> >         hierarchy. So good so far...
> >
> >         But another problem arises:
> >         The authority connector would also have to return accesstokens
> that are
> >         compatible to the flattened permission hierachy and therefore we
> must
> >         build all possible permutations of the user's groupnames. If our
> math is
> >         correct, there will be (2^n)-1 access tokens for a user (where n
> is the
> >         number of distinct groups the user is member of). Additionally
> there
> >         will be more combinations with the username. This will most
> probably not
> >         perform well for users with many group memberships.
> >
> >         I see these 2 options:
> >         - We could implement folder level accesstokens for a constant
> number X
> >         of folder levels.
> >         So the outputconnector would need to reject documents with a
> number of
> >         folder levels greater X.
> >         May be there is built in limit of page levels in confluence...
> if not,
> >         that this solution is not ideal.
> >         - Start to think about post filtering...
> >
> >         Regards,
> >         Markus
> >
> >         -----------------------------------------
> >
> >         Gesendet: Samstag, 25. Mai 2013 um 16:54 Uhr
> >         Von: "Karl Wright" <daddywri@gmail.com <mailto:
> daddywri@gmail.com>>
> >         An: "user@manifoldcf.apache.org <mailto:
> user@manifoldcf.apache.org>"
> >         <user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org>>
> >         Betreff: Re: How to map the atlassian confluence security model
> to
> >         manifoldcf
> >
> >         Hi Marcus,
> >
> >         Sorry for the slow response - it is a holiday weekend in the
> States, and
> >         that has managed to impact me to some degree.
> >          Anyhow, I've looked at the doc on Atlassian security, and I
> have some
> >         questions.  First, when you call the Atlassian API, and request
> security
> >         information for a document, in what form does it come back?  If
> it comes
> >         back as a minimal list of groups and users which can see the
> document,
> >         then you probably just want the access tokens for this connector
> to be
> >         group names/ids and user names/ids.  If it is more complicated,
> and
> >         basically you have to ascend the hierarchy either explicitly or
> >         implicitly, then we'll have to work a bit harder.  Either we'll
> have to
> >         find a flat mapping of folders to access tokens, or we'll have
> to look
> >         at extending the framework to handle more stuff.
> >
> >         As far as the folder-level security, the reason it is deprecated
> at the
> >         moment is because it is very challenging to implement properly
> in a
> >         standard search engine with a fixed schema, since there are N
> possible
> >         folder parents, where N is determined by an individual document.
> >         Furthermore, the model is not really applicable to the case
> where there
> >         is a hierarchy that cannot be flattened. But, depending on what
> the
> >         answer is to my question above, if needed we can try to come up
> with a
> >         workable folder implementation, and extend the Solr connector and
> >         plugins as well.
> >
> >         Karl
> >
> >
> >
> >         On Fri, May 24, 2013 at 6:57 PM, Markus Schuch <
> markus_schuch@web.de
> >         <mailto:markus_schuch@web.de>> wrote:Hi,
> >
> >         we are currently writing a repository connector for confluence.
> >         We are using the solr output connection on Solr 4.x.
> >         Seeding, versioning, processing works already and now we have to
> face
> >         security.
> >
> >         Compared to the already supported repositories by mcf,
> confluence seems
> >         to have a different security model.
> >
> >         There are "Space" permissions for a whole wiki space and these
> can
> >         easily be mapped as shareAllowTokens but there are also page
> >         restrictions. Page restrictions are attached to each page (page =
> >         document) and page restrictions are inherited.
> >
> >         See "Example of Child Page Restrictions" in the Confluence Doc:
> >
> https://confluence.atlassian.com/display/DOC/Page+Restrictions[https://confluence.atlassian.com/display/DOC/Page+Restrictions]
> >         <
> https://confluence.atlassian.com/display/DOC/Page+Restrictions%5Bhttps://confluence.atlassian.com/display/DOC/Page+Restrictions%5D
> >
> >
> >         The inheritance of page restrictions makes things difficult.
> >         If we are correct, than it is not sufficient to add the page
> >         restrictions as document level access tokens, because the query
> time
> >         filtering handels the user's access tokens (e.g. group
> memberships) as
> >         disjunction. Instead we probalby need a hierarchic, folder based
> >         structure of access tokens to map the inheritance of the page
> >         restrictions correctly.
> >         The current Solr SearchComponent does not support folder level
> access
> >         tokens and the book (mcf in action) says, that these kind of
> tokens are
> >         considered deprecated.
> >         To cut a long story short... we are stuck at the moment.
> >
> >         Our questions:
> >         Did anyone already manage to map confluence security to mcf/solr?
> >         Or does somebody has an idea how a confluence-like security
> model can be
> >         mapped to mcf/solr?
> >
> >         Thanks in advance
> >         Markus
> >
> >
> >
>

Mime
View raw message