lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <ykin...@xs4all.nl>
Subject MultiSearcher discards interim results
Date Mon, 03 Feb 2003 22:23:50 GMT
Dear developers,

public TopDocs search(Query query, Filter filter, int nDocs)
contains an
else break; 
which discards previous interim results. 

Since I expect to need in the order of 100 best results from
20 databases on a regular basis I don't really like this.

This is the current code:

    for (int i = 0; i < searchables.length; i++) { // search each searcher
      TopDocs docs = searchables[i].search(query, filter, nDocs);
      totalHits += docs.totalHits;		  // update totalHits
      ScoreDoc[] scoreDocs = docs.scoreDocs;
      for (int j = 0; j < scoreDocs.length; j++) { // merge scoreDocs into hq
	ScoreDoc scoreDoc = scoreDocs[j];
	if (scoreDoc.score >= minScore) {
	  scoreDoc.doc += starts[i];		  // convert doc
	  hq.put(scoreDoc);			  // update hit queue
	  if (hq.size() > nDocs) {		  // if hit queue overfull
	    hq.pop();				  // remove lowest in hit queue
	    minScore = ((ScoreDoc)hq.top()).score; // reset minScore
	  }
	} else
	  break;				  // no more scores > minScore
      }
    }


Attached is an untested patch for this. It works by implementing
a MultiCollector that has the state to collect results from
the subsearchers without discarding interim results.
The patch is a dif -c against current CVS.

I'd like to add some test cases, but before I do that
I'd prefer to have comments.

I checked the testcases for MultiSearcher, but they don't
seem to exercise the code in the patch.
The existing test-unit build runs fine with the patch.

Regards,
Ype

Mime
View raw message