lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: idf in scores
Date Tue, 07 Nov 2006 15:14:10 GMT
On 11/7/06, Antony Bowesman <adb@teamware.com> wrote:
> I've been trying to understand how idf is arrived at from a query.  I have a
> single Document with 9 fields.  One field "subject" has the phrase "RFC2822 -
> Internet Message Format" and a second "body" has the contents of rfc2822.
>
> The other fields contain additional meta data.  If I search for subject:message
> I get the following explanation.
>
> 0.15342641 = fieldWeight(subject:message in 0), product of:
>    1.0 = tf(termFreq(subject:message)=1)
>    0.30685282 = idf(docFreq=1)
>    0.5 = fieldNorm(field=subject, doc=0)
>
> why does the idf get that value?

idf is dependent only on the corpus, not on the individual document.
The formula is here:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html
1+log(1/2) = 0.30685282

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message