lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan" <peterlkee...@gmail.com>
Subject theoretical maximum score
Date Fri, 09 May 2008 13:02:54 GMT
Is it possible to compute a theoretical maximum score for a given query if
constraints are placed on 'tf' and 'lengthNorm'? If so, scores could be
compared to a 'perfect score' (a feature request from our customers)

Here are some related threads on this:

In this thread:

http://www.nabble.com/Newbie-questions-re%3A-scoring-td4228776.html#a4228776

Hoss writes:

> the only way I can think of to fairly compare scores from queries for
> foo:bar with queries for yak:baz is to normalize them relative a maximum
> possible score across the entire term query space -- but finding that
> maximum is a pretty complicated problem just for simple term queries ...
> when you start talking about more complicated query structures you really
> get messy -- and even then it's only fair as long as the query structures
> are identical, you can never compare the scores from apples and oranges

And in this thread:

http://www.nabble.com/non-relative-scoring-td8956299.html#a8956299

Walt writes:

> A tf.idf engine, like Lucene, might not have a maximum score.
> What if a document contains the word a thousand times?
> A million times?

It seems that if 'tf' is limited to a max value and 'lengthNorm' is a
constant, it might be possible, at least for 'simple' term queries. But Hoss
says that things get messing with complicated queries.

Could someone elaborate a bit? Does the index contain enough info to do this
efficiently?
I realize that scores values must be interpreted 'carefully', but I'm seeing
a push to get more leverage from the absolute values, not just the relative
values.

Peter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message