lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <karl.wri...@nokia.com>
Subject RE: FW: Solr and LCF security at query time
Date Thu, 22 Apr 2010 13:52:18 GMT
>>>>>>
What is the relationship between stored data (documents) and authorities' access/deny attributes?
(do you have any examples of what an access_token value might contain?)
<<<<<<

Documents have access/deny attributes; authorities simply provide the list of tokens that
belong to an authenticated user.  Thus, there's no access/deny for an authority; that's attached
to the document (as it is in real-world repositories).

Let's run a quick example, using Active Directory and a Windows file system.  Suppose that
you have a directory with documents in it, call it DirectoryA, and the directory allows read
access to the following SIDs:

S-123-456-76890
S-23-64-12345

These SIDs correspond to active directory groups, let's call them Group1 and Group2, respectively.

DirectoryB also has documents in it, and those documents have just the SID S-123-456-76890
attached, because only Group1 can read its contents.

Now, pretend that someone has created an LCF Active Directory authority connection (in the
LCF UI), which is called "myAD", and this connection is set up to talk to the governing AD
domain controller for this Windows file system.  We now know enough to describe the document
indexing process:

- Each file in DirectoryA will have the following __ALLOW_TOKEN__document attributes inside
Solr: "myAD:S-123-456-76890", and "myAD:S-23-64-12345".
- Each file in DirectoryB will have the following __ALLOW_TOKEN__document attributes inside
Solr: "myAD:S-123-456-76890"

Now, suppose that a user (let's call him "Peter") is authenticated with the AD domain controller.
 Peter belongs to Group2, so his SIDs are (say):

S-1-1-0 (the 'everyone' SID)
S-323-999-12345 (his own personal user SID)
S-23-64-12345 (the SID he gets because he belongs to group 2)

We want to look up the documents in the search index that he can see.  So, we ask the LCF
authority service what his tokens are, and we get back:

"myAD:S-1-1-0", "myAD:S-323-999-12345", and "myAD:S-23-64-12345"

The documents we should return in his search are the ones matching his search criteria, PLUS
the intersection of his tokens with the document ALLOW tokens, MINUS the intersection of his
tokens with the document DENY tokens (there aren't any involved in this example).  So only
files that have one of his three tokens as an ALLOW attribute would be returned.

Note that what we are attempting to do is enforce AD's security with the search results we
present.  There is no need to define a whole new security mechanism, because AD already has
one that people use.

>>>>>>
One of the key requirements I've worked to adhere to in SOLR-1872 is to ensure there are no
security or other dependencies of indexed data with any external repository - most notably
the file system.
There are many reasons for wanting this, but one of the main ones is that Solr-stored data
is not always based on file data (or accessible file data). In fact, in my particular case,
almost none of the indexed data comes from files.
<<<<<<

LCF is all about abstracting from repositories.  It's not specifically about a file system,
although that is a convenient example.  If you are building your own kind of repository with
your own security setup, that's fine - but in the LCF world you'd need to create an authority
connector for your repository (which maybe reads your acl.xml file), as well as a repository
connector (which hands documents to LCF and provides it with the access tokens that make security
work).  Of course, you can something much lighter that doesn't include LCF at all if you are
just integrating a custom repository of your own, but it sounded like you were interested
in the broader problem here.

So, LCF doesn't do "acl mapping" at all.  It relies on its various connectors to work cooperatively
to define access tokens in a way that is consistent from authority connector to repository
connector for a given repository kind.  Anybody can write a connector, so the beauty of all
this is that you can build a system where data from many disparate sources is indexed, and
security for each is simultaneously enforced.

Karl


________________________________
From: ext Peter Sturge [mailto:peter.sturge@googlemail.com]
Sent: Thursday, April 22, 2010 9:24 AM
To: dev@lucene.apache.org
Cc: connectors-user@incubator.apache.org; lucene-dev@apache.org; connectors-dev@incubator.apache.org
Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

Thanks very much for the diagram -
Sorry about all the questions, but this raises a few new ones...

What is the relationship between stored data (documents) and authorities' access/deny attributes?
(do you have any examples of what an access_token value might contain?)

One of the key requirements I've worked to adhere to in SOLR-1872 is to ensure there are no
security or other dependencies of indexed data with any external repository - most notably
the file system.
There are many reasons for wanting this, but one of the main ones is that Solr-stored data
is not always based on file data (or accessible file data). In fact, in my particular case,
almost none of the indexed data comes from files.

This is one reason why SOLR-1872 uses filter queries for its access/deny tokens - so that
all the required information for access control completely resides within the Solr index itself.
Is the LCF architecture acl 'mapping' between Solr fields (queries) and users, some external
'repository' (files) and users, or arbitrary data (e.g. either of these)?

I hope that makes sense...

Thanks!
Peter




On Thu, Apr 22, 2010 at 10:25 AM, <karl.wright@nokia.com<mailto:karl.wright@nokia.com>>
wrote:
Hi Peter,

I've attached a diagram that is not in the wiki as of yet, and I'll try to answer your questions.

>>>>>>
Are the ACCESS_TOKEN and DENY_TOKEN values whatever have been stored for a particular user
in the underlying acl store (e.g. Active Directory)?
How does AD and/or LCF handle storing such data in its schema? (does AD needs its schema extended?)
Presumably, any such AD fields would need to be queried for effective rights in order to cater
for group membership allows and denies.
<<<<<<

The ACCESS_TOKEN and DENY_TOKEN values are, in one sense, arbitrary strings that represent
a contract between an LCF authority connection and the LCF repository connection that picks
up the documents (from wherever).  These tokens thus have no real meaning outside of LCF.
 You must regard them as opaque.

The contract, however, states that if you use the LCF authority service to obtain tokens for
an authenticated user, you will get back a set that is CONSISTENT with the tokens that were
attached to the documents LCF sent to Solr for indexing in the first place.  So, you don't
have to worry about it, and that's kind of the idea.  So you imagine the following flow:

(1) Use LCF to fetch documents and send them to Solr
(2) When searching, use the LCF authority service to get the desired user's access tokens
(3) Either filter the results, or modify the query, to be sure the access tokens all match
up properly

For the AD authority, the LCF access tokens consist, in part, of the user's SIDs.  For other
authorities, the access tokens are wildly different.  You really don't want to know what's
in them, since that's the job of the LCF authority to determine. ;-)

LCF is not, by the way, joined at the hip with AD.  However, in practice, most enterprises
in the world use some form of AD single signon for their web applications, and even if they're
using some repository with its own idea of security, there's a mapping between the AD users
and the repository's users.  Doing that mapping is also the job of the LCF authority for that
repository.

Hope this helps.  Also, I'm not expecting time miracles here, so don't sweat the schedule.


Karl


________________________________________
From: ext Peter Sturge [peter.sturge@googlemail.com<mailto:peter.sturge@googlemail.com>]
Sent: Thursday, April 22, 2010 4:27 AM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
Cc: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>;
lucene-dev@apache.org<mailto:lucene-dev@apache.org>; connectors-dev@incubator.apache.org<mailto:connectors-dev@incubator.apache.org>
Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

Thanks for the quick turnaround.
I'm in the middle of a product release for us, so I fear I won't be as quick as you... :-)

I couldn't find a simple flow diagram or similar for LCF with regards security (probably looking
in the wrong place).
Perhaps you could help on these questions...?

In SOLR-1872, the allows and denies are stored (in acl.xml) as sub-queries, which are then
used as filter queries in a user's search.

Are the ACCESS_TOKEN and DENY_TOKEN values whatever have been stored for a particular user
in the underlying acl store (e.g. Active Directory)?
How does AD and/or LCF handle storing such data in its schema? (does AD needs its schema extended?)
Presumably, any such AD fields would need to be queried for effective rights in order to cater
for group membership allows and denies.

I guess I'm just trying to understand the architectural flow/storage/retrieval of data in
the various parts of the system, but I admit, I need to do more research on this.
After our product release, when I get a few more spare cycles, I can look at it in more detail.

Many thanks!
Peter



On Thu, Apr 22, 2010 at 1:02 AM, <karl.wright@nokia.com<mailto:karl.wright@nokia.com><mailto:karl.wright@nokia.com<mailto:karl.wright@nokia.com>>>
wrote:
Hi Peter,

I just committed the promised changes to the LCF Solr output connector.

ACL metadata will now be posted to the Solr Http interface along with the document as the
two following fields:

__ACCESS_TOKEN__document
__DENY_TOKEN__document

There will, of course, potentially be multiple values for each of these two fields.

Hope this helps,
Karl

________________________________
From: ext Peter Sturge [mailto:peter.sturge@googlemail.com<mailto:peter.sturge@googlemail.com><mailto:peter.sturge@googlemail.com<mailto:peter.sturge@googlemail.com>>]
Sent: Tuesday, April 20, 2010 6:51 PM

To: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org><mailto:connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>>
Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

Thanks for the info. I'll have a look at the link and try to take in as much sugar as my insulin
levels will handle...
It sounds like the necessary interface(s) are already in LCF - just a matter of implementing
them in the Solr 1872 plugin.
I'll need to digest the LCF stuff to get to grips with it..please bear with me while I do
that...

When you say:
  The LCF solr output connection doesn't yet do this, but it is trivial for me to make that
happen.
Do you mean a mechanism by which solr.war can get url et al info from its parent container
(Tomcat, Jetty etc.), or have I misinterpreted this?


Thanks,
Peter




On Tue, Apr 20, 2010 at 11:05 PM, <karl.wright@nokia.com<mailto:karl.wright@nokia.com><mailto:karl.wright@nokia.com<mailto:karl.wright@nokia.com>>>
wrote:
Hi Peter,

I'm the principal committer for LCF, but I don't know as much about Solr as I ought to, so
it sounds like a potentially productive collaboration.

LCF does exactly what you are looking for - the only issue at all is that you need to fetch
a URL from a webapp to get what you are looking for.  The "plugs" are all inside LCF for different
kinds of repositories.  Here's a link that might help with drinking the LCF "koolaid", as
it were: https://cwiki.apache.org/confluence/display/CONNECTORS/Lucene+Connectors+Framework+concepts

The url would be something like this (on a locally installed tomcat-based LCF instance):

http://localhost:8080/lcf-authority-service/UserACLs?username=someusername@somedomain.com

... and this fetch returns something like:

TOKEN:xxxxxxx
TOKEN:yyyyyyy
TOKEN:zzzzzzz
....

... which represent the amalgamated tokens for all of the defined authorities, and by some
strange coincidence ( ;-) ) are compatible with certain pieces of metadata that have been
passed into Solr with each document - one set of Allow tokens, and a second set of Deny tokens.
 The LCF solr output connection doesn't yet do this, but it is trivial for me to make that
happen.

Does this sound plausible to you?

Karl


________________________________
From: ext Peter Sturge [mailto:peter.sturge@googlemail.com<mailto:peter.sturge@googlemail.com><mailto:peter.sturge@googlemail.com<mailto:peter.sturge@googlemail.com>>]
Sent: Tuesday, April 20, 2010 5:41 PM
To: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org><mailto:connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>>;
dev@lucene.apache.org<mailto:dev@lucene.apache.org><mailto:dev@lucene.apache.org<mailto:dev@lucene.apache.org>>

Subject: Re: FW: Solr and LCF security at query time

Hi Karl,

Integrating LCF to get external token support for SOLR-1872 sounds very interesting indeed.
I don't know anything about LCF, but one of the things I was planning for SOLR-1872 is to
make acl.xml (or rather its behaviour) 'pluggable' - i.e. it would just be one of a series
of plugins that could be used for obtaining back-end authentication information.

If you're good with LCF, perhaps we could work together to build this in. One of the first
things would be defining an interface that would be as easy as possible to plug LCF into.
Have you any suggestions/insight on this front?

Many thanks,
Peter



On Tue, Apr 20, 2010 at 4:08 PM, <karl.wright@nokia.com<mailto:karl.wright@nokia.com><mailto:karl.wright@nokia.com<mailto:karl.wright@nokia.com>>>
wrote:
SOLR-1872 looks exactly like what I was envisioning, from the search query perspective, although
instead of the acl xml file you specify LCF stipulates you would dynamically query the lcf-authority-service
servlet for the access tokens themselves.  That would get you support for AD, Documentum,
LiveLink, Meridio, and Memex for free. It seems likely that this component could be modified
to work with LCF with minor effort.

The missing component still seems to be AD authentication, which needs a solution.

Karl

________________________________
From: ext Peter Sturge [mailto:peter.sturge@googlemail.com<mailto:peter.sturge@googlemail.com><mailto:peter.sturge@googlemail.com<mailto:peter.sturge@googlemail.com>>]
Sent: Tuesday, April 20, 2010 10:44 AM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org><mailto:dev@lucene.apache.org<mailto:dev@lucene.apache.org>>
Subject: Re: FW: Solr and LCF security at query time

If you want to do this completely within Solr, have a look at:
SOLR-1834 and SOLR-1872. These use a SearchComponent plugin for Solr.

Thanks,
Peter



On Tue, Apr 20, 2010 at 1:25 PM, <karl.wright@nokia.com<mailto:karl.wright@nokia.com><mailto:karl.wright@nokia.com<mailto:karl.wright@nokia.com>>>
wrote:
FYI

________________________________
From: Wright Karl (Nokia-S/Cambridge)
Sent: Tuesday, April 20, 2010 8:16 AM
To: 'dominique.bejean@eolya.fr<mailto:dominique.bejean@eolya.fr><mailto:dominique.bejean@eolya.fr<mailto:dominique.bejean@eolya.fr>>'
Cc: 'solr-dev@apache.org<mailto:solr-dev@apache.org><mailto:solr-dev@apache.org<mailto:solr-dev@apache.org>>';
'connectors-dev@incubator.apache.org<mailto:connectors-dev@incubator.apache.org><mailto:connectors-dev@incubator.apache.org<mailto:connectors-dev@incubator.apache.org>>';
'connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org><mailto:connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>>'
Subject: RE: Solr and LCF security at query time

Dominique,

Yes, I am aware of this ticket and contribution.  Luckily LCF establishes a powerful multi-repository
security model, even though it doesn't yet do the final step of enforcing that model at the
search end.  LCF allows you to define multiple authorities to operate against disparate repositories,
and use the appropriate authority to secure any given document.  The solr people are aware
of this design, which addresses the issues raised by SOLR-1834 very nicely.  However, as I
said before, time is a problem, and the work still needs to be done.

I suggest you read up on the actual security model of LCF, and perhaps experiment with that
and the SOLR-1834 contribution, to see if there is common ground.  One thing we've learned
at MetaCarta is that post-filtering for security purposes is expensive, and it is better to
modify the queries themselves to restrict the results, if possible.  I'm not sure which approach
SOLR-1834 takes, although it sounds like it might be the filtering approach.  Still, it would
be better than nothing.

Please let me know what you find out.

Thanks,
Karl

________________________________
From: ext Dominique Bejean [mailto:dominique.bejean@eolya.fr<mailto:dominique.bejean@eolya.fr><mailto:dominique.bejean@eolya.fr<mailto:dominique.bejean@eolya.fr>>]
Sent: Tuesday, April 20, 2010 8:03 AM
To: Wright Karl (Nokia-S/Cambridge)
Cc: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org><mailto:connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>>;
connectors-dev@incubator.apache.org<mailto:connectors-dev@incubator.apache.org><mailto:connectors-dev@incubator.apache.org<mailto:connectors-dev@incubator.apache.org>>
Subject: Re: Solr and LCF security at query time

Karl,

Thank you for your reply.

I made some research today and I found this :
http://freesurf001.appspot.com/issues.apache.org/jira/browse/SOLR-1834
http://demo.findwise.se:8880/SolrSecurity/

Sorl security model have to be able to filter result list with items coming from various sources
at the same time (livelink, documentum, file system, ...). Big subject :)

Dominique


Le 20/04/10 13:34, karl.wright@nokia.com<mailto:karl.wright@nokia.com><mailto:karl.wright@nokia.com<mailto:karl.wright@nokia.com>>
a écrit :
Hi Dominique,

At the moment, in order to enforce the LCF security model within Lucene/Solr, you will need
to build this functionality into whatever client you are using to display the Lucene search
results.  Specifically, you would need to take the following steps:

(1) Have your users access your search client through Apache.
(2) Use the Apache module mod_auth_kerb, combined with LCF's mod_authz_annotate, to cause
authorization HTTP headers to be transmitted to the client webapp.
(3) Have your client webapp alter whatever queries it is doing, to add an appropriate query
clause for each of the access tokens transmitted in the headers.

(This is how it is done at MetaCarta.)

Alternatively, you may find a way to do this completely with a web application under a Java
app server such as Tomcat.  I have not yet done the research to find out whether this is a
feasible alternative.  Effectively, what you need something like mod_auth_kerb to do is to
authenticate your user against Active Directory, or whomever the authenticator ought to be.
 JAAS may be helpful here.

There are, of course, intentions to fill out the missing pieces more completely and transparently
via a Solr search plugin and/or filter.  What has been lacking is time.  If you are in a position
to do development in this area, we're happy to have any assistance you might provide.

Thanks,
Karl
________________________________
From: ext Dominique Bejean [mailto:dominique.bejean@eolya.fr<mailto:dominique.bejean@eolya.fr>]
Sent: Tuesday, April 20, 2010 5:06 AM
To: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org><mailto:connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>>
Subject: Solr and LCF security at query time

Hi,

I don't see in LCF wiki how Solr and LCF works together at query time in order to remove from
the result list the items the user is not allowed to access.

In http://cwiki.apache.org/CONNECTORS/lucene-connectors-framework-concepts.html, I just see
these sentences :

" Once all these documents and their access tokens are handed to the search engine, it is
the search engine's job to enforce security by excluding inappropriate documents from the
search results. For Lucene, this infrastructure is expected to be built on top of Lucene's
generic metadata abilities, but has not been implemented at this time."

I am not sure to understand. Does this mean that for the moment, it is not possible for Solr
to apply security by using an Authority Connector ?

Dominique






---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org<mailto:dev-unsubscribe@lucene.apache.org>
For additional commands, e-mail: dev-help@lucene.apache.org<mailto:dev-help@lucene.apache.org>


Mime
View raw message