lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Osborn <>
Subject Re: filter result by catalog
Date Wed, 24 Feb 2010 00:29:14 GMT
Like you, all of my research has come to the conclusion of "it depends". For this particular
product, we have an index of a million documents or so. And each document can belong to many
catalogs. Initially, it will be a small number, but there could be up to 200 or so catalogs
(probably much less). So, for simplicity and speed of development, it probably makes the most
sense to just put the list of catalog IDs in the document. Sure, changes in the ACL will cause
a re-index of that product, but things don't change that often.

From: Chris Hostetter <>
Sent: Tue, February 23, 2010 3:40:39 PM
Subject: Re: filter result by catalog

: Yes I thought about both methods. The ACL method is easier, but has some 
: scalability issues. We use the bitset method in another product, but 
: there are some complexity and resource problems.
: This is a new project so I am revisiting the issue to see if anyone had any better ideas.

The issues with something like this really depend on the specifics ... how 
the rules of things "allowed to see" is defined, how often those rules are 
changed, how many unique users you have, what kinds of inheritence the 
rules need, etc...

for example: If your rules are as simple as 
* "every doc is in exactly one catalog
* no doc ever changes catalog
* some catalogs require subscriber level
* the list of catalogs requireing subscriber level changes daily
...then it makes sense to index the catalog name as part of hte 
documents, and have a simple two stage lookup -- pass in "subscriber" 
or "not-subscriber" at runtime, and have a parser that looks at an 
external list of subscriber catalogs and translates that into a filter at 

...if the "subscriber" catalogs never change, you can make it simpler and 
index the subscriber/not-subscriber info directly as a field; if ocs 
switch catalogs frequently, or are in multiple catalogs, or there are 
more rules, or more complex hierarchical rules, then the implementation 
becomes more involved.

but there's no single good answer.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message