lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
Date Wed, 02 Feb 2005 19:09:35 GMT
On Wednesday 02 February 2005 03:38, Chuck Williams wrote:
> Paul Elschot wrote:
>   > An alternative is to make sure all scores are bounded.
>   > Then the coordination factor can be implemented in the same bound
>   > while preserving the coordination order.
> If I understand this, I think more is required.  My normalization
> proposal from a couple months ago involved a boost-weighted
> term-coverage normalization of the raw scores (i.e., based on coord's
> that are boost-weighted).  Raw scores would be bounded in [0.0, 1.0],
> unlike now where they are unbounded.  But one also needs a way to
> recover from the just the score critical quality information like, for
> example, whether or not all terms were matched.  I was hoping to do this
> by simple thresholding, e.g. achieve a property like "results with all
> terms matched are always in [0.8, 1.0], and results missing a term
> always have a score less than 0.8".  I'm not certain whether or not that
> property can be obtained, but feel confident that this would yield a
> pretty good absolute quality measure in any event.

In case of scores bounded between 0 and b, this will do I think:

b * (score/b + nrMatchers - 1) / maxNrMatchers 

Java's float has enough room to accommodate the extra log2(nrMatchers)
bits from this.

Paul Elschot.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message