lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@yahoo.com>
Subject Possible bug in scoring function for TermQuery?
Date Sun, 22 May 2005 02:59:30 GMT
The following code in the TermWeight subclass of TermQuery seems inconsistent:
 
    public float sumOfSquaredWeights() throws IOException {
      idf = getSimilarity(searcher).idf(term, searcher); // compute idf
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;           // square it
    }
 
    public void normalize(float queryNorm) {
      this.queryNorm = queryNorm;
      queryWeight *= queryNorm;                   // normalize query weight
      // KDW - extra idf term makes no sense!!!
      value = queryWeight * idf;                  // idf for document 
    }

The inconsistency comes from the fact that when normalizing for only one term, the weight
value should be unity (1.0).  In this case, queryNorm as passed into the normalize() method
will be sqrt(1/sumOfSquaredWeights()).  The extra idf term in the normalize() method seems
thus to be superfluous.
 
I therefore think that the correct code should be:
 
    public float sumOfSquaredWeights() throws IOException {
      idf = getSimilarity(searcher).idf(term, searcher); // compute idf
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;           // square it
    }
    public void normalize(float queryNorm) {
      this.queryNorm = queryNorm;
      queryWeight *= queryNorm;                   // normalize query weight
      // KDW - extra idf term makes no sense; remove it.
      // value = queryWeight * idf;                  // idf for document 
      value = queryWeight;
    }

 
Karl

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message