I think this is the problem that you're running into, though maybe a person
with more expertise can confirm...
ZP, If you look at section 5.1 of the Zhai Lafferty paper (
http://www.cs.cmu.edu/~lafferty/pub/smoothtois.ps), they note that the
"term weight is log(1+(1\lambda)p_ml(q_id) / \lamdba p(q_iC)". P_ml is
freq/docLen, so it looks right, no?
The formula that you are looking at smooths on p(q_id), but if you look at
equation 6 (disregarding the constant at the end, that we don't need) and
read the paragraph below it, you can see that a term weight in the full log
p(qd) calculation is more that just p(q_id).
The same goes for Dong's question on Dirichlet smoothing, which also uses a
nonconstant \alpha_d, making the math a bit trickier.
Peter
On Tue, Apr 2, 2013 at 12:46 PM, Zeynep P. <zpvie@yahoo.com> wrote:
> Hi,
>
> I have the same question related to LMJelinekMercerSimiliarity class.
>
> protected float score(BasicStats stats, float freq, float docLen) {
> return stats.getTotalBoost() *
> (float)Math.log(1 + ((1  lambda) * freq / docLen) / (lambda *
> ((LMStats)stats).getCollectionProbability()));
> }
>
> score = Math.log( (1  lambda) * freq / docLen * + *lambda *
> ((LMStats)stats).getCollectionProbability()) )
>
> I am also getting much worse results by updating the code like above.
>
> Why is it calculated this way?
>
> Thanks in advance,
>
> Best regards,
> ZP
>
> P.S: Instead of creating a new question, I used your question because I
> believe that the reason should be the same.
>
>
>
> 
> View this message in context:
> http://lucene.472066.n3.nabble.com/ScoringfunctioninLMDirichletSimilarityClasstp4052488p4053267.html
> Sent from the Lucene  Java Users mailing list archive at Nabble.com.
>
> 
> To unsubscribe, email: javauserunsubscribe@lucene.apache.org
> For additional commands, email: javauserhelp@lucene.apache.org
>
>
