lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robichaud, Jean-Philippe" <Jean-Philippe.Robich...@scansoft.com>
Subject RE: ACLs and Lucene
Date Mon, 30 May 2005 15:32:13 GMT
What about:
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/luce
ne/index/ParallelReader.java?rev=169859&view=markup

Jp
-----Original Message-----
From: Bruce Ritchie [mailto:bruce@jivesoftware.com] 
Sent: Monday, May 30, 2005 11:26 AM
To: java-user@lucene.apache.org
Subject: RE: ACLs and Lucene

Markus,

> I am working on a Document Management System where every 
> document has an Access Control List attached to it. Obviously 
> a search result should only consist of documents that may be 
> viewed by the currently logged in user.
> 
> I can think of three strategies to accomplish this goal:
> 
> 1) using Filter and FilteredQuery
> 2) filtering the search result
> 3) somehow storing the ACL elements as Lucene fields
> 
> But each approach has serious drawbacks.
> 
> The first one degrades rapidly as the number of documents increases.
> Think of determining the viewability of 10,000 documents 
> where you need several SQL queries per document.
> 
> The second approach also degrades badly when a user has 
> access to a very small subset of all documents. There could 
> be thousands of false hits before the first viewable document 
> is reached.
> 
> The third approach looks most promising to me but would 
> require to update Lucene documents whenever an ACL changes. 
> Unfortunately it is not possible to update Lucene documents 
> without losing fields that are indexed but not stored, right?
> 
> So my question is: is there another approach or a "standard solution"
> I did not think of? Or how did others solve this problem?

We took a combination of the first and the second approach in our
applications. We filter by content area that the user is allowed to view
and then filter the search results that are retrieved. It's actually very
fast for us because we don't have to load the document to check the
permissions - just query an API which caches all the permissions. SQL is
only required for loading the documents that are visible for any given
result page (assumming that the document isn't already loaded into cache).

The third approach was deemed unusable for the exact reason you outlined.


Regards,

Bruce Ritchie

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message