lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: multiple collections indexing
Date Wed, 19 Mar 2003 20:17:34 GMT
Morus Walter wrote:
> Searches must be able on any combination of collections.
> A typical search includes ~ 40 collections.
> Now the question is, how to implement this in lucene best.
> Currently I see basically three possibilities:
> - create a data field containing the collection name for each document
>   and extend the query by a or-combined list of queries on this name filed.

Are lots of different combinations of collections used frequently? 
Probably not.  If only a handful of different subsets of collections are 
frequently searched, then QueryFilter could be very useful.

In this approach you construct a QueryFilter for each combination of 
collections, passing it the collection name query.  Keep the query 
filter around and re-use it whenever a query with that combination of 
collections is made.  This is very fast.  It uses one bit per document 
per filter.  So if you have a million documents and eight common 
combinations of collections then this would use one megabyte.

You could also keep a cache of QueryFilters in a LinkedHashMap (JDK 
1.4).  If the size of the cache exceeds a limit, throw away its eldest 
entry by overriding the removeEldestEntry method.  That way, if any 
combination of collections is possible, but only a few are probable, you 
can just cache the common subsets as QueryFilters.  Probably we should 
provide such a QueryFilterCache class with Lucene...

This is the approach that I would use.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message