lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soeren Pekrul <soeren.pek...@gmx.de>
Subject Re: Lucene scoring: coord_q_d factor
Date Thu, 14 Dec 2006 09:59:51 GMT
Soeren Pekrul wrote:
> The score for a document is the sum of the term weights w(tf, idf) for 
> each containing term. So you have already the combination of 
> coordination level matching with IDF. Now it is possible that your query 
> requests three terms A, B and C. Two of them (A and B) are quite often 
> in the collection one (C) is very rare. It could be possible that 
> documents are matching just C have a higher score than documents 
> containing A and B. To avoid this you can give the coordination a higher 
> influence by multiplying the sum of term weights with the coordination 
> as additional factor.

Addendum:
For the query Q(A, B, C) with
A: df++ (ifd--)
B: df++ (idf--)
C: df-- (idf++)
the user would probably expect the following ranking:
1. D(A, B, C)
2. D(A, C), D(B, C)
3. D(A, B)
4. D(C)
5. D(A), D(B)

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message