From lucene-dev-return-3081-qmlist-jakarta-archive-lucene-dev=nagoya.apache.org@jakarta.apache.org Mon Feb 03 22:50:17 2003 Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 92147 invoked from network); 3 Feb 2003 22:50:15 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 3 Feb 2003 22:50:15 -0000 Received: (qmail 10302 invoked by uid 97); 3 Feb 2003 22:51:48 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 10295 invoked from network); 3 Feb 2003 22:51:47 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 3 Feb 2003 22:51:47 -0000 Received: (qmail 90718 invoked by uid 500); 3 Feb 2003 22:49:57 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 90643 invoked from network); 3 Feb 2003 22:49:56 -0000 Received: from smtpzilla5.xs4all.nl (194.109.127.141) by daedalus.apache.org with SMTP; 3 Feb 2003 22:49:56 -0000 Received: from there (a80-127-244-91.dial.xs4all.nl [80.127.244.91]) by smtpzilla5.xs4all.nl (8.12.0/8.12.0) with SMTP id h13MnxJH079371 for ; Mon, 3 Feb 2003 23:50:00 +0100 (CET) Message-Id: <200302032250.h13MnxJH079371@smtpzilla5.xs4all.nl> From: Ype Kingma To: "Lucene Developers List" Subject: MultiSearcher discards interim results Date: Mon, 3 Feb 2003 23:23:50 +0100 X-Mailer: KMail [version 1.3.1] References: In-Reply-To: MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="------------Boundary-00=_QJ7ROP4BVZCQUY21NGKI" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --------------Boundary-00=_QJ7ROP4BVZCQUY21NGKI Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Dear developers, public TopDocs search(Query query, Filter filter, int nDocs) contains an else break; which discards previous interim results. Since I expect to need in the order of 100 best results from 20 databases on a regular basis I don't really like this. This is the current code: for (int i = 0; i < searchables.length; i++) { // search each searcher TopDocs docs = searchables[i].search(query, filter, nDocs); totalHits += docs.totalHits; // update totalHits ScoreDoc[] scoreDocs = docs.scoreDocs; for (int j = 0; j < scoreDocs.length; j++) { // merge scoreDocs into hq ScoreDoc scoreDoc = scoreDocs[j]; if (scoreDoc.score >= minScore) { scoreDoc.doc += starts[i]; // convert doc hq.put(scoreDoc); // update hit queue if (hq.size() > nDocs) { // if hit queue overfull hq.pop(); // remove lowest in hit queue minScore = ((ScoreDoc)hq.top()).score; // reset minScore } } else break; // no more scores > minScore } } Attached is an untested patch for this. It works by implementing a MultiCollector that has the state to collect results from the subsearchers without discarding interim results. The patch is a dif -c against current CVS. I'd like to add some test cases, but before I do that I'd prefer to have comments. I checked the testcases for MultiSearcher, but they don't seem to exercise the code in the patch. The existing test-unit build runs fine with the patch. Regards, Ype --------------Boundary-00=_QJ7ROP4BVZCQUY21NGKI Content-Type: text/x-diff; charset="iso-8859-1"; name="patchMultiSearcher1.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="patchMultiSearcher1.txt" SW5kZXg6IGpha2FydGEtbHVjZW5lL3NyYy9qYXZhL29yZy9hcGFjaGUvbHVjZW5lL3NlYXJjaC9N dWx0aVNlYXJjaGVyLmphdmEKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PQpSQ1MgZmlsZTogL2hvbWUvY3ZzcHVibGljL2ph a2FydGEtbHVjZW5lL3NyYy9qYXZhL29yZy9hcGFjaGUvbHVjZW5lL3NlYXJjaC9NdWx0aVNlYXJj aGVyLmphdmEsdgpyZXRyaWV2aW5nIHJldmlzaW9uIDEuMTAKZGlmZiAtYyAtcjEuMTAgTXVsdGlT ZWFyY2hlci5qYXZhCioqKiBqYWthcnRhLWx1Y2VuZS9zcmMvamF2YS9vcmcvYXBhY2hlL2x1Y2Vu ZS9zZWFyY2gvTXVsdGlTZWFyY2hlci5qYXZhCTI5IEphbiAyMDAzIDE3OjE4OjU0IC0wMDAwCTEu MTAKLS0tIGpha2FydGEtbHVjZW5lL3NyYy9qYXZhL29yZy9hcGFjaGUvbHVjZW5lL3NlYXJjaC9N dWx0aVNlYXJjaGVyLmphdmEJMyBGZWIgMjAwMyAyMjo0MzozNSAtMDAwMAoqKioqKioqKioqKioq KioKKioqIDE0MSwxNzUgKioqKgogICAgICByZXR1cm4gbWF4RG9jOwogICAgfQogIAogICAgcHVi bGljIFRvcERvY3Mgc2VhcmNoKFF1ZXJ5IHF1ZXJ5LCBGaWx0ZXIgZmlsdGVyLCBpbnQgbkRvY3Mp CiAgICAgICAgdGhyb3dzIElPRXhjZXB0aW9uIHsKISAgICAgSGl0UXVldWUgaHEgPSBuZXcgSGl0 UXVldWUobkRvY3MpOwohICAgICBmbG9hdCBtaW5TY29yZSA9IDAuMGY7CiEgICAgIGludCB0b3Rh bEhpdHMgPSAwOwohIAohICAgICBmb3IgKGludCBpID0gMDsgaSA8IHNlYXJjaGFibGVzLmxlbmd0 aDsgaSsrKSB7IC8vIHNlYXJjaCBlYWNoIHNlYXJjaGVyCiEgICAgICAgVG9wRG9jcyBkb2NzID0g c2VhcmNoYWJsZXNbaV0uc2VhcmNoKHF1ZXJ5LCBmaWx0ZXIsIG5Eb2NzKTsKISAgICAgICB0b3Rh bEhpdHMgKz0gZG9jcy50b3RhbEhpdHM7CQkgIC8vIHVwZGF0ZSB0b3RhbEhpdHMKISAgICAgICBT Y29yZURvY1tdIHNjb3JlRG9jcyA9IGRvY3Muc2NvcmVEb2NzOwohICAgICAgIGZvciAoaW50IGog PSAwOyBqIDwgc2NvcmVEb2NzLmxlbmd0aDsgaisrKSB7IC8vIG1lcmdlIHNjb3JlRG9jcyBpbnRv IGhxCiEgCVNjb3JlRG9jIHNjb3JlRG9jID0gc2NvcmVEb2NzW2pdOwohIAlpZiAoc2NvcmVEb2Mu c2NvcmUgPj0gbWluU2NvcmUpIHsKISAJICBzY29yZURvYy5kb2MgKz0gc3RhcnRzW2ldOwkJICAv LyBjb252ZXJ0IGRvYwohIAkgIGhxLnB1dChzY29yZURvYyk7CQkJICAvLyB1cGRhdGUgaGl0IHF1 ZXVlCiEgCSAgaWYgKGhxLnNpemUoKSA+IG5Eb2NzKSB7CQkgIC8vIGlmIGhpdCBxdWV1ZSBvdmVy ZnVsbAohIAkgICAgaHEucG9wKCk7CQkJCSAgLy8gcmVtb3ZlIGxvd2VzdCBpbiBoaXQgcXVldWUK ISAJICAgIG1pblNjb3JlID0gKChTY29yZURvYylocS50b3AoKSkuc2NvcmU7IC8vIHJlc2V0IG1p blNjb3JlCiAgCSAgfQohIAl9IGVsc2UKISAJICBicmVhazsJCQkJICAvLyBubyBtb3JlIHNjb3Jl cyA+IG1pblNjb3JlCiAgICAgICAgfQogICAgICB9CiAgCiAgICAgIFNjb3JlRG9jW10gc2NvcmVE b2NzID0gbmV3IFNjb3JlRG9jW2hxLnNpemUoKV07CiEgICAgIGZvciAoaW50IGkgPSBocS5zaXpl KCktMTsgaSA+PSAwOyBpLS0pCSAgLy8gcHV0IGRvY3MgaW4gYXJyYXkKICAgICAgICBzY29yZURv Y3NbaV0gPSAoU2NvcmVEb2MpaHEucG9wKCk7CiAgCiEgICAgIHJldHVybiBuZXcgVG9wRG9jcyh0 b3RhbEhpdHMsIHNjb3JlRG9jcyk7CiAgICB9CiAgCiAgCi0tLSAxNDEsMTk4IC0tLS0KICAgICAg cmV0dXJuIG1heERvYzsKICAgIH0KICAKKyAKICAgIHB1YmxpYyBUb3BEb2NzIHNlYXJjaChRdWVy eSBxdWVyeSwgRmlsdGVyIGZpbHRlciwgaW50IG5Eb2NzKQogICAgICAgIHRocm93cyBJT0V4Y2Vw dGlvbiB7CiEgCiEgICAgIGNsYXNzIE11bHRpQ29sbGVjdG9yIGV4dGVuZHMgSGl0Q29sbGVjdG9y IHsKISAgICAgICBIaXRRdWV1ZSBocTsKISAgICAgICBpbnQgbkRvY3MgPSAwOwohICAgICAgIGlu dCB0b3RhbEhpdHMgPSAwOwohICAgICAgIGludCBzdGFydCA9IDA7CiEgICAgICAgZmxvYXQgbWlu U2NvcmUgPSAwLjBmOwohICAgICAgIFNjb3JlRG9jIHNjb3JlRG9jID0gbnVsbDsgLyogcmV1c2Ug bGFzdCBvbmUgZGlzY2FyZGVkIGZyb20gaGl0cXVldWUgaHEgKi8KISAKISAgICAgICBwdWJsaWMg TXVsdGlDb2xsZWN0b3IoaW50IG5kKSB7CiEgICAgICAgICBuRG9jcyA9IG5kOwohIAlocSA9IG5l dyBIaXRRdWV1ZShuZCk7CiEgICAgICAgfQohIAohICAgICAgIHB1YmxpYyB2b2lkIGNvbGxlY3Qo aW50IGRvYywgZmxvYXQgc2NvcmUpIHsKISAgICAgICAgIHRvdGFsSGl0cysrOwohICAgICAgICAg U3lzdGVtLm91dC5wcmludGxuKGdldENsYXNzKCkgKyAiIGhpdHM6ICIgKyB0b3RhbEhpdHMgKyAi LCBzdGFydDogIiArIHN0YXJ0CiEgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICsgIiwg ZG9jTnI6ICIgKyBkb2MgKyAiLCBzY29yZTogIiArIHNjb3JlKTsKISAgICAgICAgIGlmIChzY29y ZSA+PSBtaW5TY29yZSkgewohIAkgIGlmIChzY29yZURvYyA9PSBudWxsKSB7CiEgCSAgICBzY29y ZURvYyA9IG5ldyBTY29yZURvYyhkb2MgKyBzdGFydCwgc2NvcmUpOwohIAkgIH0gZWxzZSB7CiEg CSAgICBzY29yZURvYy5kb2MgPSBkb2MgKyBzdGFydDsKISAJICAgIHNjb3JlRG9jLnNjb3JlID0g c2NvcmU7CiEgCSAgfQohICAgICAgICAgICBocS5wdXQoc2NvcmVEb2MpOwohIAkgIGlmIChocS5z aXplKCkgPiBuRG9jcykgewohIAkgICAgc2NvcmVEb2MgPSAoU2NvcmVEb2MpIGhxLnBvcCgpOwoh IAkgICAgbWluU2NvcmUgPSAoKFNjb3JlRG9jKWhxLnRvcCgpKS5zY29yZTsKISAJICB9IGVsc2Ug ewohIAkgICAgc2NvcmVEb2MgPSBudWxsOwogIAkgIH0KISAJfQogICAgICAgIH0KICAgICAgfQog IAorICAgICBNdWx0aUNvbGxlY3RvciBtYyA9IG5ldyBNdWx0aUNvbGxlY3RvcihuRG9jcyk7Cisg CisgICAgIGZvciAoaW50IGkgPSAwOyBpIDwgc2VhcmNoYWJsZXMubGVuZ3RoOyBpKyspIHsKKyAg ICAgICBtYy5zdGFydCA9IHN0YXJ0c1tpXTsKKyAgICAgICBzZWFyY2hhYmxlc1tpXS5zZWFyY2go cXVlcnksIGZpbHRlciwgbWMpOworICAgICB9CisgCisgICAgIEhpdFF1ZXVlIGhxID0gbWMuaHE7 CiAgICAgIFNjb3JlRG9jW10gc2NvcmVEb2NzID0gbmV3IFNjb3JlRG9jW2hxLnNpemUoKV07CiEg ICAgIGZvciAoaW50IGkgPSBocS5zaXplKCktMTsgaSA+PSAwOyBpLS0pCiAgICAgICAgc2NvcmVE b2NzW2ldID0gKFNjb3JlRG9jKWhxLnBvcCgpOwogIAohICAgICByZXR1cm4gbmV3IFRvcERvY3Mo bWMudG90YWxIaXRzLCBzY29yZURvY3MpOwogICAgfQogIAogIAoqKioqKioqKioqKioqKioKKioq IDIwMSwyMDcgKioqKgogIAogICAgICB9CiAgICB9CiEgICAKICAgIHB1YmxpYyBRdWVyeSByZXdy aXRlKFF1ZXJ5IG9yaWdpbmFsKSB0aHJvd3MgSU9FeGNlcHRpb24gewogICAgICBRdWVyeVtdIHF1 ZXJpZXMgPSBuZXcgUXVlcnlbc2VhcmNoYWJsZXMubGVuZ3RoXTsKICAgICAgZm9yIChpbnQgaSA9 IDA7IGkgPCBzZWFyY2hhYmxlcy5sZW5ndGg7IGkrKykgewotLS0gMjI0LDIzMCAtLS0tCiAgCiAg ICAgIH0KICAgIH0KISAKICAgIHB1YmxpYyBRdWVyeSByZXdyaXRlKFF1ZXJ5IG9yaWdpbmFsKSB0 aHJvd3MgSU9FeGNlcHRpb24gewogICAgICBRdWVyeVtdIHF1ZXJpZXMgPSBuZXcgUXVlcnlbc2Vh cmNoYWJsZXMubGVuZ3RoXTsKICAgICAgZm9yIChpbnQgaSA9IDA7IGkgPCBzZWFyY2hhYmxlcy5s ZW5ndGg7IGkrKykgewo= --------------Boundary-00=_QJ7ROP4BVZCQUY21NGKI Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org --------------Boundary-00=_QJ7ROP4BVZCQUY21NGKI--