lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <ykin...@xs4all.nl>
Subject Re: Result scoring question
Date Thu, 15 Apr 2004 07:00:18 GMT
On Wednesday 14 April 2004 20:55, Armbrust, Daniel C. wrote:
> I should have remembered that.
>
> Here are the 3 explanations for the top 3 documents returned (contents
> below)
>
> 3.3513687 = product of:
>   6.7027373 = weight(preferred_designation:"renal calculus" in 48270),
> product of: 0.8114604 = queryWeight(preferred_designation:"renal
> calculus"), product of: 18.88021 = idf(preferred_designation: renal=1111
> calculus=37) 0.04297941 = queryNorm
>     8.260092 = fieldWeight(preferred_designation:"renal calculus" in
> 48270), product of: 1.0 = tf(phraseFreq=1.0)
>       18.88021 = idf(preferred_designation: renal=1111 calculus=37)
>       0.4375 = fieldNorm(field=preferred_designation, doc=48270)
>   0.5 = coord(1/2)
>
> 2.8726017 = product of:
>   5.7452035 = weight(preferred_designation:"renal calculus" in 514631),
> product of: 0.8114604 = queryWeight(preferred_designation:"renal
> calculus"), product of: 18.88021 = idf(preferred_designation: renal=1111
> calculus=37) 0.04297941 = queryNorm
>     7.080079 = fieldWeight(preferred_designation:"renal calculus" in
> 514631), product of: 1.0 = tf(phraseFreq=1.0)
>       18.88021 = idf(preferred_designation: renal=1111 calculus=37)
>       0.375 = fieldNorm(field=preferred_designation, doc=514631)
>   0.5 = coord(1/2)
>
> 2.4832542 = product of:
>   4.9665084 = weight(other_designation:"renal calculus" in 481129), product
> of: 0.58440757 = queryWeight(other_designation:"renal calculus"), product
> of: 13.5973835 = idf(other_designation: renal=8560 calculus=971) 0.04297941
> = queryNorm
>     8.498364 = fieldWeight(other_designation:"renal calculus" in 481129),
> product of: 1.0 = tf(phraseFreq=1.0)
>       13.5973835 = idf(other_designation: renal=8560 calculus=971)
>       0.625 = fieldNorm(field=other_designation, doc=481129)
>   0.5 = coord(1/2)
>
>
> Is there anything that I can do in my query construction, to ensure that if
> a query exactly matches a document, it will be the top result?

It seems that the problem is in the idf weights.
Try using a scorer that returns a constant for the idf.
You can inherit all the default behaviour and only override the idf().

The idf weights are established for Lucene terms, which are a combination
of a field and a text term. If a text term occurs infrequently in one field, it
will score higher than in a field in which it occurs frequently.
(idf means inverse document frequency).
My guess is this is what's happening here.


Good luck,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message