On Sunday, September 21, 2003, at 03:00 PM, ykingma@xs4all.nl wrote:\
> That surprises me. I would have expected that sloppyFreq() would also
> be called for fuzzy terms. In both cases there is an distance
> that influences the effective term frequency.
WildcardQuery and FuzzyQuery do have the capability of affecting the
scoring, although only FuzzyQuery seems to take advantage of this.
There is no way on a Similarity implementation to affect the factors
applied by these queries though. So there is some inconsistency on
these types of things.
If I'm wrong about WildcardQuery, let me know, but I don't see that
searching for "luc*" gives higher weight to "luck" than "lucene",
although it seems that it should. (it uses a 1.0 multiplier hardcoded
for the boost factor of the rewritten TermQuery).
> Actually I would prefer to have two different scoring
> methods for sloppy frases and fuzzy terms.
And a different one for wildcard queries? It seems, at least to my
newbie mindset, that Similarity is carrying around too much, although
it is a one-stop place (or seems to sell itself that way) for all score
related tweaks. But there are exceptions like MultiTermQuery
subclasses like WildcardQuery and FuzzyQuery.
Is there a need to unify these types of tweaks into Similarity?
> The Similarity interface is used for determining how similar a document
> is to a query. I think sloppyFreq() is well placed there, given
> the current default implementation that 'works back' from (sloppy)
> phrase frequencies (and fuzzy term frequencies?) to normal term
> frequencies.
So now we need a getFuzzyFreq and getWildcardFreq?! :)
Erik
|