lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Lucene scoring: coord_q_d factor
Date Thu, 14 Dec 2006 12:31:27 GMT
FYI: The Wiki has a fair number of resources on IR: http://  (I have added a  
link to this conversation, which contains a lot of useful information)

Karl, if you are so inclined, please feel free to add any of the  
references you have found that have been helpful that aren't already  
on this page (anyone can edit the Wiki with an login)


On Dec 14, 2006, at 4:59 AM, Soeren Pekrul wrote:

> Soeren Pekrul wrote:
>> The score for a document is the sum of the term weights w(tf, idf)  
>> for each containing term. So you have already the combination of  
>> coordination level matching with IDF. Now it is possible that your  
>> query requests three terms A, B and C. Two of them (A and B) are  
>> quite often in the collection one (C) is very rare. It could be  
>> possible that documents are matching just C have a higher score  
>> than documents containing A and B. To avoid this you can give the  
>> coordination a higher influence by multiplying the sum of term  
>> weights with the coordination as additional factor.
> Addendum:
> For the query Q(A, B, C) with
> A: df++ (ifd--)
> B: df++ (idf--)
> C: df-- (idf++)
> the user would probably expect the following ranking:
> 1. D(A, B, C)
> 2. D(A, C), D(B, C)
> 3. D(A, B)
> 4. D(C)
> 5. D(A), D(B)
> Sören
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Center for Natural Language Processing

Read the Lucene Java FAQ at 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message