lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
Date Wed, 02 Feb 2005 22:05:23 GMT
Paul Elschot wrote:
  > On Wednesday 02 February 2005 03:38, Chuck Williams wrote:
  > > I was hoping to do this
  > > by simple thresholding, e.g. achieve a property like "results with
all
  > > terms matched are always in [0.8, 1.0], and results missing a term
  > > always have a score less than 0.8".  I'm not certain whether or
not
  > > that
  > > property can be obtained, but feel confident that this would yield
a
  > > pretty good absolute quality measure in any event.
  > 
  > In case of scores bounded between 0 and b, this will do I think:
  > 
  > b * (score/b + nrMatchers - 1) / maxNrMatchers
  > 
  > Java's float has enough room to accommodate the extra
log2(nrMatchers)
  > bits from this.
  >

Right you are.  Nice solution.  So yes, normalizing the raw scores in
[0,1] provides a trivial extension to ensure that coord information and
score information are separately retrievable from the single score
value.  This makes the normalization all the more attractive.

Thanks,

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message