Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 99015 invoked from network); 22 Oct 2004 00:37:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 22 Oct 2004 00:37:30 -0000 Received: (qmail 20485 invoked by uid 500); 22 Oct 2004 00:36:40 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 20430 invoked by uid 500); 22 Oct 2004 00:36:39 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 20417 invoked by uid 99); 22 Oct 2004 00:36:38 -0000 X-ASF-Spam-Status: No, hits=1.0 required=10.0 tests=SPF_HELO_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [64.78.19.14] (HELO reh001-1.REX001.ExchangeByRegister.com) (64.78.19.14) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 21 Oct 2004 17:36:37 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01C4B7CE.83ECCE8D" Subject: Ranking with MultiSearcher -- WAS RE: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring Date: Thu, 21 Oct 2004 17:31:44 -0700 Message-ID: X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Thread-Topic: Ranking with MultiSearcher -- WAS RE: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring Thread-Index: AcS3tTGqHgvjHBfpQyyutMVZLZr4iwABE1AwAART6rA= From: "Chuck Williams" To: "Lucene Developers List" X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N ------_=_NextPart_001_01C4B7CE.83ECCE8D Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable A simple solution occurred to me and I couldn't resist trying it. The = attached files fix Daniel's test which now returns the same scores in = both cases, and it doesn't break my app. I doubt this is the best fix = -- see issues below. These are modified 1.4.2 source files. The changes are: 1. Searcher: Add topmostSearcher() field with getter and setter to = record the outermost Searcher. Default to this. 2. MultiSearcher: Pass down the topmostSearcher when creating the = subsearchers. 3. IndexSearcher: Call Query.weight() everywhere with the = topmostSearcher instead of this. 4. Query: Provide a default implementation of Query.combine() so = that MultiSearcher works with all queries. Problems or possible problems I see: 1. This does not address the same issue with RemoteSearchable. = RemoteSearchable is not a Searcher, nor can it be due to lack of = multiple inheritance in Java, but Query.weight() requires a Searcher. = Perhaps Query.weight() should be changed to take a Searchable, but this = requires changing many places and I suspect would break apps. 2. There may be other places that topmostSearcher should be used = instead of this. 3. The default implementation for Query.combine() is a guess on my = part - it works for TermQuery. It's fragile in that the default = implementation will hide bugs caused by queries that inadvertently omit = a more precise Query.combine() method. 4. The prior comment on Query.combine() indicates that whoever wrote = it was fully aware of this problem and so probably had another usage in = mind, so the whole issue may just be Daniel's usage in the test case. = It's not apparent to me, so I probably don't understand something. Chuck > -----Original Message----- > From: Chuck Williams > Sent: Thursday, October 21, 2004 3:11 PM > To: 'Lucene Developers List' > Subject: RE: Normalized Scoring -- was RE: idf and explain(), was = Re: > Search and Scoring >=20 > The idf's are indeed computed locally, but I believe it is a simple = bug > in MultiSearcher. The attached version of the test adds explain()'s = to > verify the problem is the idf's (and changes the Field construction = to > something that works in my 1.4.2 sources). >=20 > MultiSearcher.search() calls the separate searchers for each index. > That makes the IndexSearcher the current searcher when = Similarity.idf() > is reached. Thus IndexSearcher.docFreq() is used instead of > MultiSearcher.docFreq(), yielding the index-local idf's. >=20 > The best fix is not obvious to me, but it is just a code-structure = issue. >=20 > Chuck >=20 > > -----Original Message----- > > From: Daniel Naber [mailto:daniel.naber@t-online.de] > > Sent: Thursday, October 21, 2004 2:35 PM > > To: Lucene Developers List > > Subject: Re: Normalized Scoring -- was RE: idf and explain(), = was > Re: > > Search and Scoring > > > > On Thursday 21 October 2004 23:03, Doug Cutting wrote: > > > > > Idf's are already computed globally across all indexes. = =A0Tf's are > > local > > > to the document. =A0In short, scores from a MultiSearcher are = the > same > > as > > > when searching an IndexReader with the same documents. > > > > That doesn't seem to be the case in the attached test -- am I = using > > MultiSearcher in the wrong way or what might be the problem? > > The output of the attached test is: > > > > 1+2 searched with Multisearcher: > > two blah three score=3D0.70273256 > > one blah three score=3D0.35615897 > > one foo three score=3D0.35615897 > > one foobar three score=3D0.35615897 > > > > 1+2 indexed together: > > one blah three score=3D0.5911608 > > one foo three score=3D0.5911608 > > one foobar three score=3D0.5911608 > > two blah three score=3D0.5911608 > > > > -- > > http://www.danielnaber.de ------_=_NextPart_001_01C4B7CE.83ECCE8D Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org ------_=_NextPart_001_01C4B7CE.83ECCE8D--