lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: sloppyFreq - why on Similarity?
Date Sun, 21 Sep 2003 19:00:34 GMT

> I'm just trying to come to "terms" (haha - sorry, bad pun) with all the
> glory that is Lucene.  I'm deep into PhraseQuery at the moment (big
> test case being committed soon) and have an API question.  Why is this
> method:
>    public abstract float sloppyFreq(int distance);
> on Similarity?
> The only place it is called is in SloppyPhraseScorer.  After thinking

That surprises me. I would have expected that sloppyFreq() would also
be called for fuzzy terms. In both cases there is an distance
that influences the effective term frequency.
(See 'Fuzzy searches' and 'Proximity searches' in

Actually I would prefer to have two different scoring
methods for sloppy frases and fuzzy terms.

> about this for a bit, it sort of seems out of place and not really
> something the main scoring formula needs to concern itself with.  I'm
> guessing the main reason its here is to allow custom Similarity
> implementations to control phrase queries scoring.  Is that correct?

The Similarity interface is used for determining how similar a document
is to a query. I think sloppyFreq() is well placed there, given
the current default implementation that 'works back' from (sloppy)
phrase frequencies (and fuzzy term frequencies?) to normal term frequencies.


View raw message