lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730
Date Sat, 14 Apr 2007 10:31:14 GMT
Hoss,

A bit long, sorry for that, sometimes things are just as complex as they are.

On Saturday 14 April 2007 01:13, Chris Hostetter wrote:
> 
...
> 
> I don'tget it, how would a Scorer not implement skipTo? ...oh...
> 
> 	final class BooleanScorer extends Scorer {
> 	  ...
> 	  public boolean skipTo(int target) {
> 	    throw new UnsupportedOperationException();
> 	  }

Some history for the underlying reason for this:

Once upon a time no Scorer would implement skipTo().
Most people would use BooleanScorer for queries with multiple terms, and 
things worked well with the Scorer.next() method, especially for 
disjunctions. Occasionally documents would be scored out of document order, 
but that did not lead to problems because Hits would reorder the documents by 
score value anyway.

Then skipTo() was added to improve the speed of conjunctions. To do this each 
Scorer needs to score all documents in document number order and implement 
skipTo() because it skipTo() used by ConjunctionScorer. BooleanScorer will 
only use ConjunctionScorer in very specific (but also frequently occurring) 
circumstances. At this point the index format was also changed to include the 
skip forward information.

As I said, the implementation of disjunctions in BooleanScorer does not score 
documents strictly in document order. It can be made to do that, but that 
would lead to some loss of performance. BooleanScorer uses a kind of 
distributive sort that is faster than the priority queue used by 
DisjunctionSumScorer.

Then BooleanScorer2 came along. BooleanScorer2 uses ConjunctionScorer in more 
circumstances than BooleanScorer., and it usesuses DisjunctionSumScorer for 
disjunctions. LUCENCE-730 is an attempt to get the top level disjunction 
performance of BooleanScorer back.

Disjunctions below top level, for example in a query like this:
+(a1 a2) +(b1 b2)
need skipTo() (called from ConjunctionScorer) on the two nested disjunctions, 
and for that DisjunctionSumScorer is used. Currently for the top level 
disjunction case:
a1 a2 b1 b2
DisjunctionSumScorer is normally used. But when the setUseScorer14() method is 
used, BooleanScorer will (always?) be used. The patch at LUCENE-584 tries to 
handle this setUseScorer14() case by keeping also the old filtering method 
that checks the Bits individually in IndexSearcher.
LUCENE-730 will always use BooleanScorer for the top level disjunctions, so 
with a bit of luck the setUseScorer14 method can also be deprecated/removed.

LUCENE-584 has another possible performance advantage in that it allows an 
implementation of filtering by using a ConjunctionScorer directly instead of 
doing the filtering in IndexSearcher, but that still needs to be added.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message