lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Koch" <>
Subject About Combining Scores
Date Sat, 12 Nov 2005 23:04:41 GMT
Hello Lucene experts,

I am working on a perhaps interesting problem. I am using Lucene as an IR
engine that allows users to search for documents. Additioanlly I use a user
model that produces a second score. This second score represents a different
aspect of document relevance based on data from a previous experiment. This
score however is based on other data then the content and has nothing to do
with the TF/IDF formula used in the Lucene engine. You may think of it as an
additional model that also produces a score but is based on anther view on
relevance and more statically grounded on user oppinion rather than
deterministic on word counts and distribution of words over the document

My aim is to combine this two scores. The Lucenes score is normalisied
between 0.0 and 1.0 (if the score exceeded 1.0 at some point) or less then
1.0 (if it did not). The user model looks the same in this perspective -
although based on different data - a 1.0 means the maximum of relevance and
a 0.0 a minimum or relevance. At the moment I am multiplying the Lucene
score with the score produced by the user model. This means the resulting,
combiend socre is number between 0.0 and 1.0 and represents the merged view
from both models - the IR view and the view of the user model.

Regrading this, I have a question: 

Multiplying both scores seemed obvious to me until recently. Mainly because
I have seen it before and because it seemed to deliver good results  based
on initial testing. But this is a weak assumption and I am nervious when it
comes to the mathematical foundation or at least to a decent justification
about it. Does somebody here know similar work or has worked on similar
issues and can share some ideas or perhaps point me to some papers that
address this issues. I would be interested to discuss the issue of score
combination in general. My particular problem is that I do no stay in the
pure IR field (that is covered by IR literature) but combine with other
models. More generally, I would like to know your oppion about if this is a
good idea or not. The technical fact requiremes me somehow to combine this
two scores that are explicitly independent. I am now looking for a decent
way to do that so that the meaning of the TF/IDF model is not violated.

Mathmatically I have to find the function 


where L is the Lucene TF/IDF model and U my user model.

Kind Regards,

Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat,
DSL-Flatrate für nur 4,99 Euro/Monat*

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message