lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cedric Ho" <>
Subject performance on filtering against thousands of different publications
Date Mon, 13 Aug 2007 04:17:52 GMT
Hi all,

My problem is as follows:

Our documents each comes from a different publication. And we
currently have > 5000 different publication sources.

Our clients can choose arbitrarily a subset of the publications while
performing search. It is not  uncommon that a search will have to
match hundreds or thousands of publications.

I currently try to index the publication information as a field in
each document. and use a TermsFilter when performing search. However
the performance is less than satisfactory. Many simple searches takes
more than 2-3 seconds. (our goal: < 0.5seconds).

Using the CachingWrapperFilter is great for search speed. But I've
done some calculation and figured that it is basically impossible to
cache all combination of publications or even some common

Is there any other more effective way to do the filtering?

(I know that the slowness is not purely due to the publication filter,
we also have some other things that will slow down the search. But
this one definitely contributed quite a lot to the overall search


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message