lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <grant.ingers...@gmail.com>
Subject Re: Design Problem: Searching large set of protected documents
Date Tue, 03 Apr 2007 17:18:13 GMT
I seem to recall this type of question coming up from time to time  
over the years, but don't have any specific pointers, so you may find  
it useful to dig into the archives.

-Grant

On Apr 3, 2007, at 12:00 PM, Jonathan O'Connor wrote:

> Erick,
> thanks for the tips. I guess I'll have to dust off my copy of LIA  
> and get cracking.
>
> BTW, in our system a user is just a group with a single fixed  
> member of itself.
>
> Ciao,
> Jonathan O'Connor
> XCOM Dublin
> "Erick Erickson" <erickerickson@gmail.com>
>
>
> "Erick Erickson" <erickerickson@gmail.com>
> 03/04/2007 16:44
> Please respond to
> java-user@lucene.apache.org
>
>
> To
>
> java-user@lucene.apache.org
>
> cc
>
>
> Subject
>
> Re: Design Problem: Searching large set of protected documents
> 	
>
> Storage isn't too much of a problem, 12.5 M since a Lucene Filter  
> is just
> a BitSet, one bit per document. (plus some trivial overhead).
>
> Computational costs... only you know.....
>
> But, is every user allowed individual permissions or are users part of
> groups that have permissions? Filters have logical operators (well,
> actually it's the bitset that has operators), so you may be able to
> save a bunch of time by calculating group permissions
> instead and using the relevant AND/OR/NOT operators. Assuming
> your permissions are additive, this becomes something like...
>
> Calculate a filter for each group
> for each user
>   for each group the user is in, or in the relevant group filter
>   or in the users individual permissions
>   store the filter away.
> endfor
>
> Then simply add the users filter to a BooleanQuery (MUST)
> that you use when you search.
>
> Don't know if this help much or not, but it's an idea <G>.
>
> Best
> Erick
>
> On 4/3/07, Jonathan O'Connor <jonathan.oconnor@xcom.de> wrote:
> >
> > Michael,
> > as usual its never so easy! Some users can see almost all  
> documents, and
> > some other users can see very few.
> >
> > I did find an interesting document that describes the problem  
> (but offers
> > no solutions :-() http://www.ideaeng.com/pub/entsrch/v3n4/ 
> article01.html.
> >
> > This article talks about early and late binding of security  
> information.
> > Early binding is faster, but harder to implement. And of course, I
> > implemented the easier one.
> >
> > I'm going to see what the computational and storage cost will be  
> if I
> > precalculate this info.
> > Ciao,
> > Jonathan O'Connor
> > XCOM Dublin
> > [image: Inactive hide details for "Michael D. Curtin"  
> <mike@curtin.com>]"Michael
> > D. Curtin" <mike@curtin.com>
> >
> >
> >
> >     *"Michael D. Curtin" <mike@curtin.com>*
> >
> >             03/04/2007 15:28 Please respond to
> >             java-user@lucene.apache.org
> >
> >
> > To
> >
> > java-user@lucene.apache.org
> > cc
> >
> >
> > Subject
> >
> > Re: Design Problem: Searching large set of protected documents
> >
> > Jonathan O'Connor wrote:
> >
> > > I have a database of a million documents and about 100 users. The
> > documents
> > > can have an access control list, and there is a complex, recursive
> > > algorithm to say if a particular user can see a particular  
> document.
> > >
> > > My problem is that my search algorithm is to first do a  
> standard lucene
> > > search for matching documents, and then check security on each one
> > found,
> > > just returning the allowed documents. However, if I do this,  
> and the
> > lucene
> > > returns 100000 docs, but the user can only see 10 of these, then
> > obviously
> > > the search is going to take an awful long time.
> > >
> > > Has anyone come across this problem before, and if so what  
> approach did
> > you
> > > take? I guess I could precalculate the permissions for every
> > user-document
> > > pair, but that's alot of storage, and a lot of precalculation!
> >
> > My knee-jerk reaction is to suggest a simpler document security  
> model,
> > but I'm guessing that that option isn't available to you.
> >
> > In your example the security attributes of a document are far more
> > discriminating than the query terms.  If that relationship is  
> indicative
> > of most of your users and most of the documents, the users and  
> documents
> > aren't updated much, and you have a lot of searching to do,
> > precalculation (results into an additional document field) seems  
> the way
> > to go.  It might even turn out that, if you start from a  
> presumption of
> > calculating every user--document security attribute, you come up  
> with an
> > algorithm that is much more efficient than a one-off,
> > can-this-user-see-this-document type of algorithm.
> >
> > Precalculation isn't necessarily a bad thing.  Often, it's quite
> > beneficial -- for example, the indexing process itself is a pretty
> > substantial precalculation step!
> >
> > If this seems unwieldy or impractical for some reason, perhaps  
> you could
> > post more attributes of your situation, such as user and data  
> update and
> > addition frequency, query attributes and frequency, and so on.
> >
> > --MDC
> >
> >  
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> >
> > *** XCOM AG Legal Disclaimer ***
> >
> > Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und  
> ist allein
> > für den Gebrauch durch den vorgesehenen Empfaenger bestimmt.  
> Dritten ist das
> > Lesen, Verteilen oder Weiterleiten dieser E-Mail untersagt. Wir  
> bitten, eine
> > fehlgeleitete E-Mail unverzueglich vollstaendig zu loeschen und  
> uns eine
> > Nachricht zukommen zu lassen.
> >
> > This email may contain material that is confidential and for the  
> sole use
> > of the intended recipient. Any review, distribution by others or  
> forwarding
> > without express permission is strictly prohibited. If you are not  
> the
> > intended recipient, please contact the sender and delete all copies.
> >
> > Hauptsitz: Bahnstrasse 37, D-47877 Willich, USt-IdNr.: DE 812 885  
> 664
> > Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900,
> > www.xcom.de
> > Handelsregister: Amtsgericht Krefeld, HRB 10340
> > Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty,  
> Dr. Rainer
> > Fuchs
> > Vorsitzender des Aufsichtsrates: Stephan Steuer
> >
> >
>
>
>
>
>
> *** XCOM AG Legal Disclaimer ***
>
> Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist  
> allein für den Gebrauch durch den vorgesehenen Empfaenger bestimmt.  
> Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail  
> untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich  
> vollstaendig zu loeschen und uns eine Nachricht zukommen zu lassen.
>
> This email may contain material that is confidential and for the  
> sole use of the intended recipient. Any review, distribution by  
> others or forwarding without express permission is strictly  
> prohibited. If you are not the intended recipient, please contact  
> the sender and delete all copies.
>
> Hauptsitz: Bahnstrasse 37, D-47877 Willich, USt-IdNr.: DE 812 885 664
> Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900,  
> www.xcom.de
> Handelsregister: Amtsgericht Krefeld, HRB 10340
> Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty, Dr.  
> Rainer Fuchs
> Vorsitzender des Aufsichtsrates: Stephan Steuer

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message