lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <>
Subject RE: indexing documents (or pieces of a document) by access controls
Date Thu, 14 Jun 2007 06:48:19 GMT

> When I had those kind of problems (less complex) with lucene, 
> the only 
> idea was to filter from the front-end, according to the ACL policy. 
> Lucene docs and fields weren't protected, but tagged. Searching was 
> always applied with a field "audience", with hierarchical values like
> "public, reserved, protected, secret", so that a "public" 
> document has 
> the "secret" value also, to be found with a 
> "audience:secret", according 
> to the rights of the user who searchs. For the fields, the 
> not allowed 
> ones for some users where striped.

Yes I know this is a possibility...but we happen to want our authorisation facetted based.
I am attacking the problem with keeping derived data from lucene in memory all translated
into some byte/int values. The hardest part is keeping the derived data in sink with lucene
*and* the different jackrabbit users (some have changes in there session but not yet saved
their data)

Anyway, I can do facetted authorisation + counting in less than 20 ms for 1.000.000 documents
(normal pc) so hopefully I can succeed. I must admit OTH, that I did not find some sort of
ingenious algorithm, but merely depend on the speed of the processor: doubling the number
of documents means doubling the response time and needed memory (though 1.000.000 doc fitted
in 25 Mb, so 40.000.000 in a Gb...that is fine by me) 

> May be you can have a look to the xmldb Exist ? The search engine, 
> xquery based, is not focused on the same goals as lucene, but I can 
> promise you that all queries will never return results from documents 
> you are not allowed to read.

I did not look at it, but my feeling is that it is not fast enough,

Regards Ard

> -- 
> Frédéric Glorieux
> École nationale des chartes
> direction des nouvelles technologies et de l'informatique

View raw message