lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 31841] - [PATCH] MultiSearcher problems with Similarity.docFreq()
Date Mon, 15 Nov 2004 12:26:43 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=31841>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=31841


siberski@learninglab.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #13413|0                           |1
        is obsolete|                            |




------- Additional Comments From siberski@learninglab.de  2004-11-15 13:24 -------
Created an attachment (id=13464)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=13464&action=view)
Complete patch to allow specification the Similarity on a query-by-query case

This is a followup on my patch/comment from 2004-11-12. The patch attached now
is complete, i.e. everything compiles and no test case is broken by it.
The main idea is that a setSimilarity() method and a similarity attributed is
added to Query. If the similarity is not set, the query uses the Searcher's
similarity as before. However, if one sets the Similarity on the query, this
one takes precedence.
To solve the MultiSearcher issue, I have provided two different Similarities:
- MultiSimilarity delegates the docFreq() and maxDoc() calls to the
MultiSearcher, thus retrieving the sum over all registered searchers.
This Similarity always 'gets it right', but obviously doesn't work with
RemoteSearchables.
- DfMapSimilarity analyses a query and caches all necessary docFreq values.
This Similarity is Serializable and therefore works with RemoteSearchables,
too. However, it is not able to handle queries where the term set is not known
beforehand, e.g. wildcard queries.

Both problems mentioned in my previous comment (thread-safety and remote
searcher compatibility) are solved by this patch. All test cases work
unchanged with the exception of one test case which had been tweaked
previously due to the incorrect MultiSearcher and now works as expected
(TestSort.testNormalizedScores()).

Problems:
- DfMapSimilarity.collectDfs() contains a lot of ugly casts to Query
subclasses.
  This could be avoided by adding another abstract method to Query, but it is
  unclear if this is really the better solution.
- In this patch the choice for DfMapSimilarity is hard-coded into
MultiSearcher. This should be made configurable.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message