lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <karl.wri...@nokia.com>
Subject RE: FW: Solr and LCF security at query time
Date Thu, 29 Apr 2010 12:10:23 GMT
Putting access control lookup at search-result time has the following benefits:

- It sees changes right away, when the underlying repository changes

Here are the drawbacks, as far as I can see:

- There's a significant extra load on the repository, because every search result has to be
checked against the repository in real time
- It will perform very poorly on queries were there are a lot of matching documents, but the
search user can't see most of them

Having only one general solution means that you have to pick one or the other of the two models.
 We opted for the model we did because the drawbacks were potentially severe, especially under
conditions of high demand.  The repository load question is not a trivial one, because it
scales as the number of results returned, which is a potentially gigantic number.

However, I am perfectly fine with supporting both models.  Your suggested solution will work
for some classes of problem.  It seems to me that in order to support it you will need a parallel
infrastructure to do that.  We could develop that infrastructure within LCF, but it's a bit
of work to do:

(1) Output an "internal repository document security identifier" into the index, in addition
to tokens.  This id is not the same at all as the document's URI, which is what literal.id
is currently set to, so a new solr schema field would need to be made for this.  All output
connectors would need to be modified to do this, and all repository connectors as well.
(2) Since the security identifier would be valid within the context of a given repository
connection, the "authority service" code that tries to verify visibility of a document given
the authenticated user name and security identifier would need to look up the correct repository
connection and call a method within it - which currently doesn't exist.  So we'd need to write
such a method for all connectors that have security.
(3) Since this service would have a high load, and only be used under one particular model,
I'd suggest actually defining a whole new webapp for it, so it can be distributed/controlled
independently.

Karl


________________________________
From: ext Peter Sturge [mailto:peter.sturge@googlemail.com]
Sent: Thursday, April 29, 2010 5:35 AM
To: connectors-user@incubator.apache.org
Cc: dev@lucene.apache.org; connectors-dev@incubator.apache.org; lucene-dev@apache.org
Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

I guess it comes down to - any solution is ultimately going to place access control on a search
and not on data, so there isn't much to be gained by binding the access control to the data.
Whatever attributes exist at index time to build an acl will still be there at query time,
so by making the acl search-bound, the acl is decoupled from the data, allowing it to be used
in any use case scenario.

Here's a typical sampling of use cases where the decoupling of acl from data is required:

One customer has a  'shop-search' requirement where, logged-in users' access to various shops
changes daily, sometimes 4 or 5 times a day. There are several hundred such shops and 10s
of millions of documents, and the indexing part doesn't have ownership of any of the 'source'
documents.

Another example is a customer who has multiple sites and multiple AD domains. They have one
domain for the UK, but a completely separate domain for Gibraltar. When data is replicated
to  remote servers accessed by Gibraltar staff, these users have no SID information in the
other domain.

An 'interesting' example of this at the extreme is 34rkl4ys Bank, where, due to departmental
history, they have no fewer than 85 AD domains! This of course is a nightmare in itself, but
trying to tie access information to data at storage time is virtually impossible in this environment.

The thing I'm trying to understand is that the decoupled approach works equally well for the
requirements where you do have acl information at index time. I guess I'm not understanding
the advantages to making schema changes and binding acl to data, when there's really no need.
I particularly like your idea of using LCF as the facilitator of storing/retrieving such decoupled
data (as opposed to just an xml file). It sounds like there's even a user interface for 'non-technical'
staff to make acl configuration changes. That's really cool, and ultimately an elegant solution
that will fit present and future needs.


Kind regards,
Peter


On Thu, Apr 29, 2010 at 1:24 AM, <karl.wright@nokia.com<mailto:karl.wright@nokia.com>>
wrote:
Hi Peter,

I'm more than happy to hear your customer's requirements, so no problem there.  It does seem
to me that they are a bit different than what I've seen.  I think there is plenty of room
for different flavors of solution, so please by all means go ahead and propose your take on
it!

Karl

________________________________________
From: ext Peter Sturge [peter.sturge@googlemail.com<mailto:peter.sturge@googlemail.com>]
Sent: Wednesday, April 28, 2010 8:07 PM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
Cc: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>;
connectors-dev@incubator.apache.org<mailto:connectors-dev@incubator.apache.org>; lucene-dev@apache.org<mailto:lucene-dev@apache.org>
Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

I wasn't trying to to put pay to your design proposal, really the opposite - to highlight
requirements that have found to be necessary for customers/users, and to hopefully get the
best functionality for the product. If you feel I've put you out on any of the issues raised,
then I apologize for that, it was certainly not my intention.

Peter



Mime
View raw message