lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Climan" <>
Subject Re: Modifying the stored norm type
Date Tue, 20 Jun 2006 16:42:32 GMT
>Paul Elschot <> 
>>On Tuesday 20 June 2006 12:02, Marcus Falck wrote:
>> After a lot of debugging and some API doc reading I have come to the
> conclusion that the static encodeNorm method of the Similarity class
> will encode my boost value into a single byte decimal number.
>> And I will loose a lot of resolution and will get severe rounding
>> errors. 
>> Since I need the exact float value as boost representation this isn't
>> good enough in my case.

>An exact float value is a bit of an oxymoron.
>How exact do you need it to be? 

>The range of values that can be encoded by the existing encodeNorm()
>and decodeNorm() is quite big (about 10e-10 to 10e+10 iirc),
>and since there are only 255 possible values in there (excluding 0),
>the rounding errors can be severe indeed.
>However, with a smaller range, the rounding errors would also be smaller.

>Are 256 different values enough for your case?

>Paul Elschot

I, too, have found that 256 values were not enough. I tried changing the
encodeNorm function to use a narrower range of values, but the 256 limit
makes it degrade quickly if I get any results outside the expected range.
This was true when we tried various algorithms for boosting results based on
external factors. 

FunctionQuery(not currently in core lucene) from the SOLR project may be be
an alternative. Can it replace all uses of the norm?

Now that omitNorms is part of the core, the impact of allowing a 2 byte (or
even 4 byte norm) is not nearly as severe on memory. Any suggestions for how
to create a multi-byte norm as an option, but enable the same code to
reading existing 1 byte norms?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message