lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Murzaku <murz...@yahoo.com>
Subject Re: Scoring
Date Mon, 02 Sep 2002 13:31:46 GMT
Lucene uses a variation of TF-TDF for the similarity score where the
document length is one of the factors. In the future you will be able
to modify scoring to your needs: 
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html

There might be more in the FAQ or in the mail archive. This is
something that has been discussed quite often. For example, you could
learn about the scoring mechanism in:
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq#q31

--- Chris Sibert <chrissibert@attbi.com> wrote:
> I am disatisfied with the document scores that I'm getting. If a
> document is short, and has one occurrence of the search term, it is
> ranked higher than a longer document with two occurrences of the
> term. This makes little sense to me, and I'd like the longer document
> with more occurrences to be ranked higher. I figured I have to
> override the scoring method, but I can't find where Lucene actually
> does the scoring. This is actually not an uncommon problem for me, as
> I find perusing the API to be high on the confusing scale, due to the
> lack of comprehensive Javadoc documentation. (Something that even Sun
> doesn't spend much time on.) I attempt to read the code, but variable
> names are terse, and there's a dearth of commenting, which makes it
> fairly unfathomable. 


=====
__________________________________
alex@lissus.com -- http://www.lissus.com

__________________________________________________
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes
http://finance.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message