lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Conway" <>
Subject query performance behavior not as expected
Date Mon, 29 Aug 2005 15:24:56 GMT
Hello.  I've got a problem perhaps some of you have  help with.

I have an application that has to use fairly long queries (containing about 30 terms or'ed
together) against an index of about 500K documents.  Because of the limited vocabulary I'm
indexing and querying over (~2000 terms), the size of the query, and the number of documents
involved my search times are running a little long.  I'd like to speed them up a little more
if possible.

One approach I've tried is structuring the queries such that one (or more) of a subset of
the entire 30 terms is required, the rest being optional, as in:

+(term1 term2 term3 ... term10) term11 term12 term13 ... term30

this yielded a search time (on average) of about 50 msecs.

I then assumed that if I reduced the size of the required set from 10 to 5, I would get fewer
documents to score against and query performance would increase.  So I tried something like

+(term1 term2 term3 term4 term5) term6 term7 ... term30

To my surprise, the performance of the overall query didn't change (actually, it was slower,
at about 63 msecs on average).   My expectation about the way lucene would interpret and execute
this query was apparently incorrect.   

The obvious answer here might be to use a filter for the first (required) clause and then
query again using that filter for the other  terms.  The problem I forsee with that solution
is that I can't easily re-use the filters because of the sheer number of combinations of terms
and the need to re-open my readers/searchers every few minutes to expose the steady stream
of updates to querying on a regular basis.  As I understand it re-using a filter (rather than
creating it, using it, and discarding it) is integral to it's value as a time saver and thus
maybe not appropriate in this case.

Any thoughts or advice would be appreciated.  Many thanks in advance!

Greg Conway
Textwise Labs

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message