lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Lucene's Ranking Function
Date Wed, 11 Sep 2002 21:14:30 GMT
Clemens Marschner wrote:
> 1. I think the new document boost is missing, isn't it?
> With that it should be something like
> 
>  score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t)
> * coord_q_d * boost_d
> Is that correct?

Almost.  This should actually be boost_d * boost_d_t, the boost factor 
for the document multiplied by the boost for t's field in d.

> 2. If I like the score to be independent of the number of terms in the
> document (regarding them as essentially constant), is it enough to leave out
> the norm_d_t factor?

Yes.  Note however that the quantity called 'norm' in the code is now 
frequently actually norm_d_t * boost_t * boost_d_t.  This quantity is 
now computed at index time and stored in the norms file.

> I have seen that a norm factor between 0 and 255 is read with
> IndexReader.norms() in TermScorer.score(). Is that the one?

Yes, although see my note above.

> From what I further understand (and from digging in Witten/Moffat/Bell) the
> norm_q factor is not calculated, since it stays the same for one query.

Lucene calculates it anyway.  It's cheap to compute: it is multiplied 
together with the term boost and idf once per query term, then this 
weight is used in subsequent computations.

Doug


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message