lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Keegan <peterlkee...@gmail.com>
Subject sloppyFreq question
Date Tue, 03 Mar 2009 19:42:40 GMT
The DefaultSimilarity class defines sloppyFreq as:

public float sloppyFreq(int distance) {
  return 1.0f / (distance + 1);
}

For a 'SpanNearQuery', this reduces the effect of the term frequency on the
score as the number of terms in the span increases. So, for a simple phrase
query (using spans), the longer the phrase, the lower the TF. For a simple
SpanTermQuery, the TF is reduced in half (1.0f / 1 + 1).

I'm just wondering why this is the default behavior. For 'SpanTermQuery',
I'd expect the TF to reflect the actual number of occurrences of the term.
For a SpanNearQuery, wouldn't it still be the number of occurrences of the
whole span, not the number of terms in the span?

Thanks,
Peter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message