lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kasun Perera <kas...@opensource.lk>
Subject Different Weights to Lucene fields with Okapi Similarity
Date Tue, 19 Jun 2012 09:56:49 GMT
Based on this link http://www2002.org/CDROM/refereed/643/node6.html , I'm
calculating Okapi similarity between the query document and another
document as below using Lucene:

I have indexed the documents using 3 fields. I want to give higher weight
to field 2 and field 3. I can't use Lucene's boost function since i'm using
a my own similarity function. Can anyone suggest me a method how to give
different weights to fields using this Okapi Similarity function?

This is Okapi Similarity Schema that I have used

sim(query, doc) = sum(t in terms(query), freq(t, query) * w(t, doc))

where (from the second link, slightly modified as I think the formula in
the link is incorrect)

w(t, doc) = idf(t) * (k+1)*freq(t, doc) / (k*(1-b + b*ls(doc)) + freq(t, doc))

ls(doc) = len(doc)/avgdoclen

and idf(t) is idf(t) = log (totalNumIndexedDocs - docFreq + 0.5)/(docFreq +
0.5), freq(t, doc) is the frequency of term t in document doc.

Choosing b=0.25 and k = 1.2 you get

w(t, doc) = idf(t) * 2.2*freq(t, doc) / (1.2*(0.25+0.75*ls(doc)) + freq(t, doc))

-- 
Regards

Kasun Perera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message