lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: sloppyFreq - why on Similarity?
Date Mon, 22 Sep 2003 18:15:34 GMT
On Sunday, September 21, 2003, at 03:00  PM, ykingma@xs4all.nl wrote:\
> That surprises me. I would have expected that sloppyFreq() would also
> be called for fuzzy terms. In both cases there is an distance
> that influences the effective term frequency.

WildcardQuery and FuzzyQuery do have the capability of affecting the 
scoring, although only FuzzyQuery seems to take advantage of this.  
There is no way on a Similarity implementation to affect the factors 
applied by these queries though.  So there is some inconsistency on 
these types of things.

If I'm wrong about WildcardQuery, let me know, but I don't see that 
searching for "luc*" gives higher weight to "luck" than "lucene", 
although it seems that it should.  (it uses a 1.0 multiplier hardcoded 
for the boost factor of the rewritten TermQuery).

> Actually I would prefer to have two different scoring
> methods for sloppy frases and fuzzy terms.

And a different one for wildcard queries?  It seems, at least to my 
newbie mindset, that Similarity is carrying around too much, although 
it is a one-stop place (or seems to sell itself that way) for all score 
related tweaks.  But there are exceptions like MultiTermQuery 
subclasses like WildcardQuery and FuzzyQuery.

Is there a need to unify these types of tweaks into Similarity?

> The Similarity interface is used for determining how similar a document
> is to a query. I think sloppyFreq() is well placed there, given
> the current default implementation that 'works back' from (sloppy)
> phrase frequencies (and fuzzy term frequencies?) to normal term 
> frequencies.

So now we need a getFuzzyFreq and getWildcardFreq?!  :)

	Erik


Mime
View raw message