lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Koch" <TheRan...@gmx.net>
Subject Re: About Combining Scores
Date Sun, 13 Nov 2005 20:44:49 GMT
Hello Sebastian,

thank you for sharing your experience. I am happy that I am not the only
person with this problem. 

I have read the previous paper by Robertson et al

http://citeseer.ist.psu.edu/robertson04simple.html

where he wrote about the danger of using combined scores and provided a
solution via linear combination of TFs before inserting it into the BM25
weighting algoritm. However, this does not apply to my/your problem since it
works only with one sort of scoreing function respectively and not like in
my case with two differen sorts of scoring that are generally different. 

I think the paper you suggested might be closer to my need - although I
doubt it is close enough to inspire me or even to provide some mathematical
justification for simple operations between two scores (like
multiplication). 

Are you aware of any mathematical justification for multiplying the two
scores? Did you have any other motivation behind it besides its simplicity?

Thank you advance!

Kind Regards,
Karl




Regrading your solution, do you have a publication or is there a planned
publication about what you did for your solution? 

> --- Ursprüngliche Nachricht ---
> Von: Sebastian Marius Kirsch <skirsch@sebastian-kirsch.org>
> An: java-user@lucene.apache.org
> Betreff: Re: About Combining Scores
> Datum: Sun, 13 Nov 2005 10:10:22 +0100
> 
> On Sun, Nov 13, 2005 at 12:04:41AM +0100, Karl Koch wrote:
> > My aim is to combine this two scores. The Lucenes score is normalisied
> > between 0.0 and 1.0 (if the score exceeded 1.0 at some point) or less
> then
> > 1.0 (if it did not). The user model looks the same in this perspective -
> > although based on different data - a 1.0 means the maximum of relevance
> and
> > a 0.0 a minimum or relevance. At the moment I am multiplying the Lucene
> > score with the score produced by the user model. This means the
> resulting,
> > combiend socre is number between 0.0 and 1.0 and represents the merged
> view
> > from both models - the IR view and the view of the user model.
> 
> I came across that question too recently; it seems to be a rather
> under-researched topic in the literature. I used multiplication in the
> end, because it's simple, it produces reasonable results, it's not
> tunable, and it's invariant to normalization. (Don't make a model with
> tunable parameters if you don't know how to tune them ...)
> 
> The most helpful paper I came across was this:
> 
> http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf
> 
> It's about combining PageRank with a relevance score, but it contains
> a good description of how they arrived at their scoring formula. They
> use a linear combination of the two measures and transform them to
> have a roughly similar distribution. They then tuned the parameters
> using a test corpus (which may be difficult/impossible for your
> application.) Their system was one of the best at TREC-13.
> 
> Regards, Sebastian
> 
> -- 
> Sebastian Kirsch <skirsch@sebastian-kirsch.org>
> [http://www.sebastian-kirsch.org/]
> 
> NOTE: New email address! Please update your address book.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

-- 
Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat,
DSL-Flatrate für nur 4,99 Euro/Monat*  http://www.gmx.net/de/go/dsl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message