lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject RE: question about TermQuery
Date Mon, 08 Oct 2001 15:52:36 GMT
It is correct to include idf twice.  Recall that the weighting is roughly:
  (tf_q * idf_t / norm_q) * (tf_d * idf_t / norm_d)

The TermQuery.weight field has all of this that is not document specific,
i.e., everything but in this but tf_d and norm_d, so weight should be:
  (tf_q * idf_t / norm_q) * idf_t

The code is a little different, since we don't calculate tf_q, the frequency
of the term in the query, assuming that it is one, and instead use a 'boost'
factor.  But the term's idf (idf_t) should really be in there twice.

The query normalization factor, 1/norm_q, is calculated based on the value
of the sumOfSquaredWeights, and is passed back in through the normalize()

Normalize() is only called once per call to sumOfSquaredWeights.  These
calls are initiated on a Query in the method Query.scorer():

  static Scorer scorer(Query query, Searcher searcher, IndexReader reader)
    throws IOException {
    float sum = query.sumOfSquaredWeights(searcher);
    float norm = 1.0f / (float)Math.sqrt(sum);
    return query.scorer(reader);

So it all looks okay to me.


> -----Original Message-----
> From: Dmitry Serebrennikov []
> Sent: Sunday, October 07, 2001 2:19 PM
> To:
> Subject: question about TermQuery
> I'm looking through the TermQuery code (and generally trying to 
> understand exactly how the searching works) and I found this 
> code that 
> looks suspicious to me. It is very likeley that I just don't 
> understand 
> what's going on, but there is a chance that this is a bug, so 
> I wanted 
> to ask for clarification / review from Doug and others.
> In the TermQuery.normalize(float norm), weight is being 
> multiplied first 
> by the normalization factor (the argument) and then by the 
> idf, that was 
> stored in the TermQuery before. Although I can't say for sure 
> that this 
> is wrong, it does look suspect. First, idf is already factored into 
> weight in the sumOfSquaredWeights() method, and second, if 
> normalize is 
> called multiple times, idf will be multiplied into weight over and 
> over... Plus the comment in normalize doesn't really make 
> sense, and the 
> way the code is written makes me think that this is a problem 
> caused by 
> a CVS merge conflict, and that only the line "weight *= norm" 
> should be 
> in that method. Am I right?
> ======================================================
>   final float sumOfSquaredWeights(Searcher searcher) throws 
> IOException {
>     idf = Similarity.idf(term, searcher);
>     weight = idf * boost;
>     return weight * weight;              // square term weights
>   }
>   final void normalize(float norm) {
>     weight *= norm;                  // normalize for query
>     weight *= idf;                  // factor from document
>   }
> ======================================================

View raw message