lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: Newbie questions re: scoring
Date Fri, 05 May 2006 00:52:09 GMT

: That link appears to be referring to normalized scores (everything is <
: 1.0).  Is it also not safe to use a threshold for raw scores?

Nope.  The basic flaw in comparing scores between two queries still holds
... early messages in the threads linked to go into more detail, but as i
recall, the basic problem has to do with the way idf and docFreq come into
play.  Just becuase a term query for foo:bar says that document A has a
score of 2.2 and B has a score of 6.6; and a term query for yak:baz says
that document X has a score of 2.2 and Y has a score of 6.6 doesn't means
X is as relevent to yak:baz as A is to foo:bar -- it just means that the
relative quality of B compared to A is the same as the relative quality of
Y compared to X for their respective queries.  (once their normalized,
even that goes out the window)

the only way I can think of to fairly compare scores from queries for
foo:bar with queries for yak:baz is to normalize them relative a maximum
possible score across the entire term query space -- but finding that
maximum is a pretty complicated problem just for simple term queries ...
when you start talking about more complicated query structures you really
get messy -- and even then it's only fair as long as the query structures
are identical, you can never compare the scores from apples and oranges.





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message