lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 31841] - [PATCH] MultiSearcher problems with Similarity.docFreq()
Date Mon, 21 Feb 2005 17:49:14 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=31841>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=31841





------- Additional Comments From cutting@apache.org  2005-02-21 18:49 -------
This looks good.  Thanks!

A few comments:

Orignally there was no Weight in Lucene, only Query and Scorer.  Weight was
added in order to make it so that searching did not modify a Query, so that a
Query instance could be reused.  Searcher-dependent state of the query is meant
to reside in the Weight.  IndexReader dependent state resides in the Scorer. 
Your "freezing" a query violates this.  Can't we create the weight once in
Searcher.search?

CachedDfSource does not need to be public does it?

We need to think about back-compatibliity.  Folks have implementations of Query,
Weight, Similarity and Scorer.  So, when a public API changes we need to
deprecate, not remove, old methods, and try hard to make the old version still
work.  So, for example, we need to figure out how to handle the case where folks
have implemented the old Similarity.idf() methods.

You no longer call Query.getSimilarity(Searcher).  That method permits queries
to override the Searcher's Similarity implementation.  Is there a reason you do
this?  We should be computing DFs once for the whole query tree, but it should
still be possible to compute, e.g., IDFs independently per node, no?

I also wonder if, instead of adding DocFreqSource we could instead still use the
Searcher.  MultiSearcher could keep an LRU cache of total doc freqs, implemented
with LinkedHashMap, for the last few thousand search terms.  That would be a far
less invasive change, and hence less likely to break folks.  Or am I missing
something?

Sorry if I seem picky, but this is core stuff in Lucene and affects a lot of people.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message