lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Wildcard query ...
Date Mon, 13 Oct 2008 18:54:25 GMT

BooleanQuery picks a Scorer based on the number of clauses and what their 
options are ... all of teh scorers it might pick from are smart enough to 
continuously reorder the clauses having them "skip ahead" to the next 
document they match, beyond whatever docIds it already knows can't match 
(based on the skipping of the other clauses.

so it really doesn't amtter what order hte clauses appear in, it will 
optimize away as much work as it can.

while *order* of clauses doesn't matter, *structure* of clauses can -- 
beyond just having subtle scoring differneces, these two queries...

   +(+A:X +B:Z) +(+C:Y +D:Z)
   +(+C:Y +B:Z) +(+A:X +D:Z)

...could have radically differnet performance characteristics, because the 
"skipping" happens at each level of the BooleanQuery hierarchy.  if 
only one doc matches (+A:X +B:Z) then lots of skipping will happen in that 
first query with only a few matches actually being tested for the other 
clauses and the query as a whole -- but if lots of docs match (+C:Y +B:Z) 
and lots of *other* docs match (+A:X +D:Z) the subqueries won't ever skip 
very far.

The other factor to keep in mind is the wildcard expansion ... Honda* will 
be expanded into all terms that start with Honda in that field before 
anything ever looks at what docs match any clauses of your query -- even 
if only one doc matches Type:0  ... which is why index partitioning can 
make sense in some situations like this.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message