lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: MultiSearcher discards interim results
Date Tue, 11 Feb 2003 14:38:25 GMT

Sorry for the delay. I'm having a hardware problem on my home machine and
I'm using webmail now. It might take some time before I can continue.

> I'm confused.  The contract of this method is to return the top-scoring
> nDocs.  For a multi-searcher it must compute the top-scoring nDocs from
> each sub-searcher, then find the top-scoring nDocs among these.

For the first sub-searcher, yes. For later sub-searchers it
is only necessary to keep the documents that score not smaller
than the current minimum score.

Worst case: consider what happens when later subsearchers only find scores
smaller than the minimum score kept by the first
In that case the current code builds up a full ndocs size priority queue
for each later subsearcher, and all these results are going to be

The patch intends to avoid the housekeeping of the nDocs size priority
queues for the later sub-searchers by using a single priority queue for
all sub-searchers.

The story behind this is that at some point I actually had a very bad
implementation of a multi searcher and retriever using
a TopDocs result from each subsearcher. The net effect was that not only
where all results kept, but also all the stored results had to be
retrieved, before discarding most of them.
Needless to say that I switched to a home grown HitCollector
very soon...

As the current MultiSearcher also provides consistent scoring
between databases, I'm going to use it asap.

Are people actually using the nightly builds? I'd also
like to give the scoring explanation facilities a try.

Kind regards,
Ype Kingma

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message