lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armbrust, Daniel C." <Armbrust.Dan...@mayo.edu>
Subject RE: Result scoring question
Date Thu, 15 Apr 2004 16:16:18 GMT
Thanks for the advice.

I created a class to extend DefaultSimilarity, and made it return 10 for the idf value.  (I
don't really have any data to back up picking 10, other than it seems to work)

This did indeed, cause my exact matches to float up to the top.  Your explanation makes sense,
because for this particular query, there were only 2 documents in the index that contained
the words "renal calculus" in the preferred_designation field while there were hundreds that
 contained those words in the other_designation field.

I'll keep testing it to make sure that nothing odd happens in other searches now, but is seems
good so far.

Thanks, 

Dan



************************************ 
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl] 
Sent: Thursday, April 15, 2004 2:00 AM
To: Lucene Users List
Subject: Re: Result scoring question


It seems that the problem is in the idf weights.
Try using a scorer that returns a constant for the idf.
You can inherit all the default behaviour and only override the idf().

The idf weights are established for Lucene terms, which are a combination
of a field and a text term. If a text term occurs infrequently in one field, it
will score higher than in a field in which it occurs frequently.
(idf means inverse document frequency).
My guess is this is what's happening here.


Good luck,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message