lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <>
Subject Re: MultiSearcher discards interim results
Date Wed, 12 Feb 2003 21:56:32 GMT
On Tuesday 11 February 2003 21:39, you wrote:
> Ype Kingma wrote:
> >>I'm confused.  The contract of this method is to return the top-scoring
> >>nDocs.  For a multi-searcher it must compute the top-scoring nDocs from
> >>each sub-searcher, then find the top-scoring nDocs among these.  If you
> >
> > For the first sub-searcher: yes. For the remaining sub-searchers it is
> > only necessary to collect docs with a score not smaller than the minimum
> > score provided by the first subsearcher.
> Okay, now I see what you're after.  You wish to minimize the cost of
> maintaining the queue of top-scoring documents.  But does hit queue
> maintenance ever significantly affect overall search performance?  I'd
> be very surprised if it does.  So, while perhaps not optimal, I suspect
> the current implementation is adequate.

It's probably adequate, but I have a typical case of needing the 100 to
200 best results from 15 to 20 searchers, so I don't like to see a suboptimal

> Also, the current approach works well with RemoteSearchable, while I
> suspect your optimized version would not.  And if someone were ever to

Quite so, because the current approach would only need one remote call per 
searcher, but see below.

> implement a parallel (distributed or not) version of MultiSearcher, then
> your optimization would be difficult to implement.

A parallel implementation would require to update the minimum score to each 
subsearcher, which might substantially reduce the communication requirements. 
A bit of delay is quite tolerable in this update.
To balance computation and communication, it might well be necessary to use a 
priority queue with an asynchronously updated minimum score for each 
.And yes, I hope to use a parallel version in the future: more processors,
more threads, more disks :).

> In summary, please tell me of a use case where this optimization
> substantially improves overall performance.

I have not done any real life measurements because I'm not (yet) using
the development version of Lucene. For the next few months I'm tied to java 
1.1.8, would that be a problem?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message