lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: sloppyFreq question
Date Wed, 11 Mar 2009 19:12:12 GMT

: For a 'SpanNearQuery', this reduces the effect of the term frequency on the
: score as the number of terms in the span increases. So, for a simple phrase
: query (using spans), the longer the phrase, the lower the TF. For a simple
: SpanTermQuery, the TF is reduced in half (1.0f / 1 + 1).
: I'm just wondering why this is the default behavior. For 'SpanTermQuery',
: I'd expect the TF to reflect the actual number of occurrences of the term.
: For a SpanNearQuery, wouldn't it still be the number of occurrences of the
: whole span, not the number of terms in the span?

I believe it's because a Span typically encomases multiple positions -- 
there's no advantage i can think of for executing a SpanTermQuery 
directly.  note that when you execute a SpanQuery, it doesn't pay any 
attention to the tf/idf of any nested queries, it only looks at the 
aggregated Spans.

I suppose SpanTermQuery could override the weight/scorer methods so that 
it behaved more like a TermQuery if it was executed directly ... but 
that's really not what it's intended for.

(it's unfortunate that all of the SpanQueries use a hierarchical class 
structure instead of having a single SpanQuery that composes a 
"SpanClause" hierarchy)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message