lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Issue with Similarity and negative numbers
Date Thu, 11 Sep 2003 17:42:22 GMT
Nick Smith wrote:
> I had taken care to make sure that the change was *compatible*.  :-(

You've radically changed the range of values that can be represented. 
Instead of representing numbers from 0 to 7 billion, you represent 
numbers between -1.75 and 1.5 (at least according to the listings you 
sent).  That's a big change.

> What about the change would break lots of folks?

Anyone who uses a document or field boost larger than 1.5 will no longer 
see what they expect.  Also, existing indexes will no longer produce the 
same scores, or even the same rank ordering.

 > My rational was that
> if the mapping for positive bytes to postive floats and visa-versa was
> unchanged the only way to store negative bytes in the index would be
> to use a negative float as a field or document boost.

But what about all those folks who are using values that were 
represented by negative bytes?  I know I'm one.

> I have a highly dynamic index of news headlines where the incoming
> headlines are often not in cronological order.  To make things worse
> changes must to be made to headlines post-indexing without affecting
> their chronological order.
> 
> I overide the default Similarity instance to disable field
> normalization and set the date-sorting 'hint' using
> Document.setBoost(float)
> 
> Also using the score I can implementing a forward / back paging as
> the score is persistent and the document ids are not. I do this
> my using a org.apache.lucene.search.Filter and accessing the
> scores through IndexReader.norms(String field) and only setting
> the BitSet when score is in required range.
> 
> A previous solution used the HitFilter and document id solution
> that you suggested. Alas it did not work 100% correctly.

How about this: when indexing, set the boost to 
Similarity.decodeNorm(byte).  Then, in your HitCollector implementation, 
use IndexReader.norms() to directly access the byte stored.  The divide 
the score by Similarity.encodeNorm(byte) to remove this factor from the 
score.  Would that do the trick?

> Is there a FAQ entry about common date-sorting methods?

Dunno.

Cheers,

Doug



Mime
View raw message