lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dwaipayan Roy <dwaipayan....@gmail.com>
Subject Explain Scoring function in LMJelinekMercerSimilarity Class
Date Tue, 20 Dec 2016 17:21:07 GMT
Hello,

Can anyone help me understand the scoring function in the
LMJelinekMercerSimilarity class?

The scoring function in LMJelinekMercerSimilarity is shown below:
--------------------------------------------------------
float score = stats.getTotalBoost() *
(float)Math.log(1 + ((1 - lambda) * freq / docLen) / (lambda *
((LMStats)stats).getCollectionProbability()));
--------------------------------------------------------

Can anyone help explain the equation? I can understand the scoring effect
when calculating the stat in the document, i.e.: (1 - lambda) * freq /
docLen).

I hope getCollectionProbability() returns col_freq(t) / col_size. Am I
right?

Also the boosting part is not clear to me (stats.getTotalBoost()).

I want to reproduce the result of the scoring using LM-JM. Hence I want the
details.

Thanks.
Dwaipayan Roy..

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message