lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Max Pfingsthorn" <>
Subject RE: ACLs and Lucene
Date Mon, 30 May 2005 08:40:28 GMT

I've got exactly the same problem. Maybe it is possible to extend the previously discussed
patch to fragment the fields of one document into separate files to actually allow updating
only one fragment? Then, updating frequently changing fields (like ACLs or other meta data,
maybe even a PageRank value for Nutch?), would be cheaper. This would also allow to easily
'render' ACLs on the documents they influence while changing the ACLs. After all, you don't
change ACLs as often as you access documents. I guess this would be hard, as the lexicon is
stored elsewhere... Any ideas?
It would of course be even better to properly separate these in different indices and be able
to map document id's across them. Updating would be rather simple, and retrieval may be done
in parallel. Maybe a custom RelationalMultiSearcher would be in order?

I've also thought about combining document and field based fragmentation strategies. Since
we need subsecond search and update performance of a multi-million document index in the near
future, this seems the way to go. Hardware would not really be an issue here, but of course
we want to be efficient, especially in a multi-processor environment. Have there been any
thoughts about this?

Best regards,

Max Pfingsthorn


Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
------------------------------------------------------------- /

-----Original Message-----
From: Markus Wiederkehr []
Sent: Monday, May 30, 2005 09:47
To: Lucene users
Subject: ACLs and Lucene

I am working on a Document Management System where every document has
an Access Control List attached to it. Obviously a search result
should only consist of documents that may be viewed by the currently
logged in user.

I can think of three strategies to accomplish this goal:

1) using Filter and FilteredQuery
2) filtering the search result
3) somehow storing the ACL elements as Lucene fields

But each approach has serious drawbacks.

The first one degrades rapidly as the number of documents increases.
Think of determining the viewability of 10,000 documents where you
need several SQL queries per document.

The second approach also degrades badly when a user has access to a
very small subset of all documents. There could be thousands of false
hits before the first viewable document is reached.

The third approach looks most promising to me but would require to
update Lucene documents whenever an ACL changes. Unfortunately it is
not possible to update Lucene documents without losing fields that are
indexed but not stored, right?

So my question is: is there another approach or a "standard solution"
I did not think of? Or how did others solve this problem?

Thanks in advance,


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message