lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Harwood <markharw...@yahoo.co.uk>
Subject Re: on-the-fly "filters" from docID lists
Date Fri, 23 Jul 2010 06:55:59 GMT
Re scalability of filter construction - the database is likely to hold stable primary keys
not lucene doc ids which are unstable in the face of updates. You therefore need a quick way
of converting stable database keys read from the db into current lucene doc ids to create
the filter. That could involve a lot of disk seeks unless you cache a pk->docid lookup
in ram.  You should use cachingwrapperfilter too to cache the computed  user permissions from
one search to the next. 
This can get messy. If the access permissions are centred around roles/groups it is normally
faster to tag docs with these group names and query them with the list of roles the user holds.

If individual user-doc-level perms are required you could also consider dynamically looking
up perms for just the top n results being shown at the risk of needing to repeat the query
with a larger n if insufficient matches pass the lookup. 

Cheers 
Mark
----------------------------------------


On 23 Jul 2010, at 01:55, Michael McCandless <lucene@mikemccandless.com> wrote:

> Well, Lucene can apply such a filter rather quickly; but, your custom
> code first has to build it... so it's really a question of whether
> your custom code can build up / iterate the filter scalably.
> 
> Mike
> 
> On Thu, Jul 22, 2010 at 4:37 PM, Burton-West, Tom <tburtonw@umich.edu> wrote:
>> Hi Mike and Martin,
>> 
>> We have a similar use-case.   Is there a scalability/performance issue with the getDocIdSet
having to iterate through hundreds of thousands of docIDs?
>> 
>> Tom Burton-West
>> http://www.hathitrust.org/blogs/large-scale-search
>> 
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Thursday, July 22, 2010 5:20 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: on-the-fly "filters" from docID lists
>> 
>> It sounds like you should implement a custom Filter?
>> 
>> Its getDocIdSet would consult your foreign key-value store and iterate
>> through the allowed docIDs, per segment.
>> 
>> Mike
>> 
>> On Wed, Jul 21, 2010 at 8:37 AM, Martin J <martinj.engine@gmail.com> wrote:
>>> Hello, we are trying to implement a query type for Lucene (with eventual
>>> target being Solr) where the query string passed in needs to be "filtered"
>>> through a large list of document IDs per user. We can't store the user ID
>>> information in the lucene index per document so we were planning to pull the
>>> list of documents owned by user X from a key-value store at query time and
>>> then build some sort of filter in memory before doing the Lucene/Solr query.
>>> For example:
>>> 
>>> content:"cars" user_id:X567
>>> 
>>> would first pull the list of docIDs that user_id:X567 has "access" to from a
>>> keyvalue store and then we'd query the main index with content:"cars" but
>>> only allow the docIDs that came back to be part of the response. The list of
>>> docIDs can near the hundreds of thousands.
>>> 
>>> What should I be looking at to implement such a feature?
>>> 
>>> Thank you
>>> Martin
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message