lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cutt...@lucene.com
Subject RE: cvs commit: jakarta-lucene TODO.txt
Date Tue, 28 May 2002 21:39:37 GMT
> From: Alex Murzaku [mailto:murzaku.at.yahoo.com@cutting.at.lucene.com]
>
> Second, never heard from Doug whether it is possible
> in theory to implement some other similarity/distance
> function and to plug these instead of the standard
> enhanced tf*idf. I think there was at list one other
> Lucene user interested in this (especially in the case
> of short fields like addresses, single sentences,
> etc.)

Some scoring changes are hard, some are easy.

Relatively easy things:
 - changing per-term factor in score -- currently idf, i.e.,
log(numDocs/df+1)+1
 - changing factor based on term's freq within document -- currently
sqrt(tf)
 - changing the coordination factor, the boost a hit gets for containing a
large percentage of the query terms.

These all correspond to methods in Similarity.java.  These could be made
into a TermWeight interface, with a default implementation, and a way to
specify an alternate implementation when building a searcher.

Somewhat harder:
 - changing per-document factor in score -- currently sqrt(docLength).  This
is also a Similarity method, but it is called when the index is created, so
its implementation cannot be changed at search time.

The scoring formula sums products of all these factors.

Harder-yet things:
 - change the form of the scoring formula itself: a fair amount of code
assumes that scores are a sums of products of the above factors.  It would
be challenging to design things both so that the formula can be easily
altered and so that things are efficient.  I think if folks really want to
change the formula fundamentally, they're best off using IndexReader
directly and writing a search algorithm from scratch.

So what in particular that you're interested in altering?  Would you be
satisfied with the addition of a TermWeight interface?

Doug

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message