lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: A question about scoring function in Lucene
Date Wed, 15 Dec 2004 20:14:16 GMT
: I question whether such scores are more meaningful.  Yes, such scores
: would be guaranteed to be between zero and one, but would 0.8 really be
: meaningful?  I don't think so.  Do you have pointers to research which
: demonstrates this?  E.g., when such a scoring method is used, that
: thresholding by score is useful across queries?

I freely admit that I'm way out of my league on these scoring discussions,
but I believe what the OP was refering to was not any intrinsic benefit in
having a score between 0 and 1, but of having a uniform normalization of
scores regardless of search terms.

For example, using the current scoring equation, if i do a search for
"Doug Cutting" and the results/scores i get back are...
      1:   0.9
      2:   0.3
      3:   0.21
      4:   0.21
      5:   0.1
...then there are at least two meaningful pieces of data I can glean:
   a) document #1 is significantly better then the other results
   b) document #3 and #4 are both equaly relevant to "Doug Cutting"

If I then do a search for "Chris Hostetter" and get back the following
      9:   0.9
      8:   0.3
      7:   0.21
      6:   0.21
      5:   0.1

...then I can assume the same corrisponding information is true about my
new search term (#9 is significantly better, and #7/#8 are equally as good)

However, I *cannot* say either of the following:
  x) document #9 is as relevant for "Chris Hostetter" as document #1 is
     relevant to "Doug Cutting"
  y) document #5 is equally relevant to both "Chris Hostetter" and
     "Doug Cutting"

I think the OP is arguing that if the scoring algorithm was modified in
the way they suggested, then you would be able to make statements x & y.

If they are correct, then I for one can see a definite benefit in that.
If for no other reason then in making minimum score thresholds more


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message