lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shailesh Kochhar <shailesh.koch...@gmail.com>
Subject Re: setSimilarity on Query
Date Tue, 13 Nov 2007 04:50:54 GMT
Chris Hostetter wrote:
> independent of the QueryParser aspects of your question, adding a 
> setSimilarity method to the Query class would be a complete 180 of how it 
> currently works right now.
> 
> Query classes have to have a getSimilarity method so that their 
> Weight/Scorer have a way to access the similarity functions ... but every 
> core type of query gets that similarity from the searcher being used when 
> hte query is executed.
> 
> if the Query class defined a "setSimilarity" then the similarity used by 
> one query in a BooleanQuery might not be the same as another query in the 
> same query structure ... queryNorms, idfs, tfs ... could all be completley 
> nonsensical.

The getSimilarity() implementation in Query actually invokes 
Searcher.getSimilarity() which in turn returns the value of 
Similarity.getDefault()

IndexSearcher has a corresponding setSimilarity() method which will 
override the value return value which makes it convenient for what 
you're trying to accomplish.

There is, however, another point of discord -- which is the Weight 
associated with the Query (which is relevant if you want a different 
implementation of term weighting). Here the locus of control is inverted 
-- it is the Searcher which delegates to the Query in order to create 
the Weight. In order to change the scoring implementation one needs to 
implement a new Query class, a new Weight class, a new Similarity class 
and a new QueryParser.

A friendlier alternative I'd like to propose is a sort of Weight and 
Similarity factory which is provided either to the top level Query 
object that is returned from parsing -- or to the Searcher object that 
processes the query. The factory can then return Similarity and Weight 
implementations that are identical for all parts of the query and which 
are mutually consistent.

This would allow field specific Similarity and Weight implementations 
and would also be backwards compatible.

> A more logical extension point is probably long the lines of past 
> discussion towards making all of the Similarity methods take in a field 
> name (so you could have a "PerFieldSimilarityWrapper" type implementation) 
> and/or changing Searchable.getSimilarity to take in a fieldname param.
> 
> i don't think anyone every submitted a patch for either of those ideas 
> though ... if you check the mailing list archives you'll see there were 
> performance concerns about one of them (i think it was the first one 
> because some of those methods are in tight loops, which is unfortunate 
> because it's the one that can be done in a backwards compatible way)




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message