lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Boolean Scorer
Date Sat, 11 Dec 2004 12:53:48 GMT
On Friday 10 December 2004 21:35, Doug Cutting wrote:
> Christoph Goller wrote:
> > I think we should change BooleanScorer. An easy way would be to sort the 
> > bucket
> > list before it is used. Do you think that would affect performance 
> > dramatically?
> 
> I think it would make it slower.
> 
> > Otherwise we should reimplement BooleanScorer. I haven't looked into the
> > DisjunctionScorer patch in Bugzilla yet. Maybe it's a good starting point.
> 
> I think we should incorporate Paul's code into CVS.  This algorithm may 
> be slower in some cases, but it may also be faster in some cases.  We 
> should add a static method to switch back to the old implementation, and 
> encourage folks to benchmark their code.  If it proves no slower then we 
> could remove the old implementation altogether.
> 

There may be an alternative to this in the form of adding skipTo() to the
current Boolean Scorer. Before I wrote the alternative
boolean scorer, I investigated this possibility shortly, but I did not
see how adding skipTo() could be done easily.
Nonetheless, it might be possible.

Here is some background on the alternative boolean scorer.
More information is in the posting on bugzilla and from the javadocs.
http://issues.apache.org/bugzilla/show_bug.cgi?id=31785

The core of the DisjunctionScorer is based on a simplification of 
SpanOrQuery. In particular class DisjunctionScorer.ScorerQueue is a
simplified version of SpanOrQuery.SpanQueue in that it only needs to
use document numbers, but not term positions.

The existing ConjunctionScorer needed to be slightly extended to implement
NrMatchersScorer, which is a Scorer that also provides the number of
matching subscorers. The number of matchers is needed to provide
coordination factor back the level of the BooleanQuery through some
nested scorers.
In case the code of the alternative boolean is added in cvs, it might be
considered to merge the nrMatchers() method into the current Scorer.

To complete the alternative boolean scorer, I added scorers for
combining with prohibited scorers and for combining with optional scorers.
These combining scorers were available from an extension of the Surround
query language I posted in April this year.

Mapping the required, optional and prohibited scorers of a BooleanQuery
to a nesting of these combining scorers, DisjunctionScorer and
ConjunctionScorer was straightforward, but a bit tedious.
It is done by the make...SumScorer methods.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message