lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.com>
Subject Re: performance/scalability issues re filtering of protected search results
Date Tue, 11 Nov 2008 06:02:04 GMT
Erick Erickson schrieb:
> This has been discussed more than a few times, I suggest you take
> a look at the searchable archive for things like privileges, access
> privileges, etc. You'll find lots of information faster that way...
>   
You mean Erik Hatcher's answer re SecurityFilter
http://archives.devshed.com/forums/apache-92/lucene-vs-sql-database-1416862.html

or Eugene Dzhurinsky's post where he is also asking re pre-filtering

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200510.mbox/%3c20051005080100.GD457@jdevoff.zssm.zp.ua%3e
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200510.mbox/%3C20051005113829.GA59420@jdevoff.zssm.zp.ua%3E
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200510.mbox/%3c20051005125619.GC59420@jdevoff.zssm.zp.ua%3e

and Erik keeps suggesting to use a filter

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200510.mbox/%3c20051005101406125.00000003632@huixp%3e

whereas Hui also points out that filters are a problem re 
peformance/scalability

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200510.mbox/%3c20051005101406125.00000003632@huixp%3e

I actually searched quite a bit before my post, but didn't find an 
answer which was really answering the performance/scalability issues.
But if it was answered before and somebody knows about it, then I would 
very much appreciate any concrete URLs/pointers.

Thanks

Michael
> Best
> Erick
>
> On Mon, Nov 10, 2008 at 2:52 PM, Michael Wechner
> <michael.wechner@wyona.com>wrote:
>
>   
>> Hi
>>
>> We have about 1 mio documents and growing within a hierarchical order (3 to
>> 20 deep) and about 3000 people accessing these nodes, whereas some people
>> have access to certain branches and other people to other branches and some
>> branches are shared. The access control of these nodes is changing every day
>> and also contains shortcuts  which allows people to glimpse into parts of
>> branches which they otherwise do not have access to.
>>
>> Currently we have one index for all nodes, which is ok
>> peformance/scalability wise, but before displaying the results we need to
>> filter based on the access privileges each user has, which is very bad
>> peformance wise, because it might be that the first 10K hits are all
>> protected re this user and hence it can take a very long time that one
>> finally finds a result that the user is actually allowed to see.
>>
>> We were thinking about introducing an index for each user which only
>> contains the documents a user is actually is allowed to see, but this
>> doesn't scale well either if the user number is growing.
>>
>> Any hints how other people are approaching such a situation would be very
>> much appreciated.
>>
>> Thanks
>>
>> Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message