lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: speeding up queries (MySQL faster)
Date Sun, 22 Aug 2004 22:33:15 GMT
Yonik Seeley wrote:
> Setup info & Stats:
> - 4.3M documents, 12 keyword fields per document, 11
  [ ... ]
> "field1:4 AND field2:188453 AND field3:1"
> 
> field1:4      done alone selects around 4.2M records
> field2:188453 done alone selects around 1.6M records
> field3:1      done alone selects around 1K records
> The whole query normally selects less than 50 records
> Only the first 10 are returned (or whatever range
> the client selects).

The "field1:4" clause is probably dominating the cost of query 
execution.  Clauses which match large portions of the collection are 
slow to evaluate.  If there are not too many different such clauses then 
you can optimize this by re-using a Filter in place of such clauses, 
typically a QueryFilter.

For example, Nutch automatically translates such clauses into 
QueryFilters.  See:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/searcher/LuceneQueryOptimizer.java?view=markup

Note that this only converts clauses whose boost is zero.  Since filters 
do not affect ranking we can only safely convert clauses which do not 
contribute to the score, i.e, those whose boost is zero.  Scores might 
still be different in the filtered results because of 
Similarity.coord().  But, in Nutch, Similarity.coord() is overidden to 
always return 1.0, so that the replacement of clauses with filters does 
not alter the final scores at all.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message