lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "João Rodrigues" <anar...@gmail.com>
Subject Lucene's Scoring & Regular TF-IDF
Date Mon, 10 Mar 2008 22:05:14 GMT
Hello all!

I've asked here a few days ago if I could get a "raw" tf-idf score out of
lucene's methods. I was kindly advised to hack my way through the "explain"
method. I have, but I can't make any sense of the information which there is
stated. Here's a print from a search.explain. My comments & doubts are along
in bold:


Lucene Score: 1.000000
Explanation:

1.9983159 = (MATCH) weight(contents:chaperone in 73615), product of:
  0.99999994 = queryWeight(contents:chaperone), product of:
    7.3838615 = idf(docFreq=137, numDocs=81725) *-> I calculated this as
2.7756 or 6.3911 (if using Log or Ln)*
    0.13543049 = queryNorm
  1.998316 = (MATCH) fieldWeight(contents:chaperone in 73615), product of:
    1.7320508 = tf(termFreq(contents:chaperone)=3) *-> The doc has 32 tokens
(according to luke) and 3/32 != 1.7320508*
    7.3838615 = idf(docFreq=137, numDocs=81725)
    0.15625 = fieldNorm(field=contents, doc=73615)

---------------------------------------------------------------------------


So, what am I missing? I read the regular tf-idf rule from wikipedia, along
with some other text books I found, so I'm pretty sure it is ok. I didn't
set any boost factor or anything (otherwise it would also appear here I
suppose). I am using the Standard Analyzer, thus accounting for a higher tf,
but not that enormity.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message