lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
Date Mon, 21 Feb 2005 18:00:22 GMT
Wolf Siberski wrote:
> Now I found another solution which requires more changes, but IMHO is
> much cleaner:
> - when a query computes its Weight, it caches it in an attribute
> - a query can be 'frozen'. A frozen query always returns the cached
>   Weight when calling Query.weight().

Orignally there was no Weight in Lucene, only Query and Scorer.  Weight 
was added in order to make it so that searching did not modify a Query, 
so that a Query instance could be reused.  Searcher-dependent state of 
the query is meant to reside in the Weight.  IndexReader dependent state 
resides in the Scorer.  Your "freezing" a query violates this.  Can't we 
create the weight once in

> This approach requires that weights can be serialized. Interestingly,
> Weight already implements Serializable, but the current implementation
> doesn't work for all weight classes. The reason is that some weights
> hold a reference to a searcher which is of course not serializable.
> We can't make it transient either, because this searcher is the source
> of the Similarity needed by scorers.
> On closer look it turned out that the searcher is used only for two
> things: as source for a Similarity, and as docFreqs&maxDoc source.
> docFreq&maxDoc are only necessary to initialize the weights, but not
> needed by scorers. So instead of providing the Searcher, I now provide
> a Similarity and a DocFreqSource to the weights. Only the Similarity is
> stored by weights.

We need to make sure, however, that this is the correct Similarity.  It 
should still be the result of Query.getSimilarity(Searcher), which 
doesn't appear to be the case in your patch.

As for DocFreqSource versus Searcher, couldn't the Searcher be passed as 
  a source for docFreqs and simoly have Weights not keep a pointer to 
it?  This isn't a big deal, but it would substantially mimimize the API 

> As (IMHO) positive side effect, Similarity got rid of
> Searcher dependencies, which leads to a better split of responsibilities:
> - Similarity only provides scoring formulas
> - Searcher (rsp. DocFreqSource) provides the raw data (tf/df/maxDoc)
> This change affects quite a few classes (because the createWeight() 
> signature
> is changed), but the modifications are pretty straightforward.

But couldn't the signature change be avoided if the Weight constructors 
immediately call Query.getSimilarity(Searcher) to get their Similarity, 
and no longer kept a pointer to the Searcher?

> From my point of view, the patch submitted now is a sound solution
> for Bug 31841 (at least I like it :-) ).
> The next thing which IMHO needs to be done is a review by someone else.

I've make a quick review, but it would be nice if others looked at this too.

Thanks again for all your work here!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message