lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: Performance question
Date Fri, 21 Jul 2006 00:46:34 GMT
> Does it matter what order I add the sub-queries to the BooleanQuery Q.
> That is, is the execution speed for the search faster (slower) if I do:
>             Q.add(Q1, BooleanClause.Occur.MUST);
>             Q.add(Q2, BooleanClause.Occur.MUST);
>             Q.add(Q3, BooleanClause.Occur.MUST);
> As opposed to:
>             Q.add(Q3, BooleanClause.Occur.MUST);
>             Q.add(Q2, BooleanClause.Occur.MUST);
>             Q.add(Q1, BooleanClause.Occur.MUST);
> Or does it matter at all?
>
> There are cases where I know that, for 99% of the time, certain portions
> of my queries are likely to be more selective and I could affect the
> order they get added to the BooleanQuery.  Those of you who know lucene
> internals, is there anything worth doing here?
> Scott

I think the order in which the query is composed would not matter, as
Lucene already takes care of this: the order by which the
<list-of-doc-ids-of-a-term> are traversed by ConjunctionScorer (in effect
for a boolean AND query) lets the more selective term lead the query
execution. In fact it seems to be stronger - assume an index with 3000
docs, and a query with 3 terms:
   +t1 +t2 +t3
Also assume that t1, t2, t3 is more selective (rare) for docs [d0,d1000),
t2 more selective for [d1000,d2000) and t3 more selective in [d2000,d3000).
The way Lucene traverses these doc-ids lists, would let t1 lead the
computation in the first range [0,1000], t2 lead it in the second range
[1000,2000), and t3 in the 3rd range.

Regards,
Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message