lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: comparing lucene scores across queries
Date Mon, 28 Mar 2011 23:57:45 GMT

: I see, well if you say the norm isn't a problem for my case, I will just
: disable the coord factor by initializing BooleanQuery(true); and I should be
: done.

querynorm hsouldn't be a problem (since your booleanqueries all have hte 
same structure, and odn't use query boosts ... i assume) but field norm 
might be; i also don't see anything mentioned so far in this thread that 
describes how you'll work arround the tf and idf values being theretically 
unbounded (unless your docs are all of identical length)

ultimatley, attempts at comparing scores across different searches all 
come down to normalizing (either explicitly or implicitly) and normalizing 
requires that you have a "max possible score" you can normalize relative 
to -- not just a "max score for the index", but a max score in the scope 
of all theretical documents (because otherwise the comparison isn't fair 
given an arbitrary corpus)

with the default similarity, you can't really define a "max possible 
score" for a given query because tf and idf are not bounded functions.


There have been a few nice discussions about this general concept over the 
years, here's the first once i found doing a quick search...

http://www.gossamer-threads.com/lists/lucene/java-user/61075





-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message