lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject RE: Newbie questions re: scoring
Date Fri, 05 May 2006 00:52:09 GMT

: That link appears to be referring to normalized scores (everything is <
: 1.0).  Is it also not safe to use a threshold for raw scores?

Nope.  The basic flaw in comparing scores between two queries still holds
... early messages in the threads linked to go into more detail, but as i
recall, the basic problem has to do with the way idf and docFreq come into
play.  Just becuase a term query for foo:bar says that document A has a
score of 2.2 and B has a score of 6.6; and a term query for yak:baz says
that document X has a score of 2.2 and Y has a score of 6.6 doesn't means
X is as relevent to yak:baz as A is to foo:bar -- it just means that the
relative quality of B compared to A is the same as the relative quality of
Y compared to X for their respective queries.  (once their normalized,
even that goes out the window)

the only way I can think of to fairly compare scores from queries for
foo:bar with queries for yak:baz is to normalize them relative a maximum
possible score across the entire term query space -- but finding that
maximum is a pretty complicated problem just for simple term queries ...
when you start talking about more complicated query structures you really
get messy -- and even then it's only fair as long as the query structures
are identical, you can never compare the scores from apples and oranges.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message