lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <>
Subject Re: understanding the norm encode and decode
Date Wed, 04 Mar 2015 14:04:55 GMT
Hi Adrien,

I read somewhere that norms are stored using docValues. 
In my understanding, docvalues can store lossless float values.
So the question is, why are still several decode/encode methods exist in similarity implementations?
Intuitively switching to docvalues for norms should prevent precision loss thing.


On Wednesday, March 4, 2015 3:22 PM, Adrien Grand <> wrote:

Floats require 32 bits but norms are encoded on a single byte. So
there is a precision loss when encoding float values into a single
byte. In your example, 0.75 and 0.89 are sufficiently close to each
other so that they are encoded to the same byte.

On Wed, Mar 4, 2015 at 4:48 AM, wangdong <> wrote:
> I read the article about the scoring section in lucene as follows:
> Encoding and decoding of the resulted float norm in a single byte are done
> by the static methods of the class Similarity:encodeNorm()
> <>anddecodeNorm()
> <>.
> Due to loss of precision, it is not guaranteed that decode(encode(x)) = x,
> e.g. decode(encode(0.89)) = 0.75. At scoring (search) time, this norm is
> brought into the score of document as*norm(t, d)*, as shown by the formula
> inSimilarity
> <>.
> I can not understand the formula decode(encode(0.89)) = 0.75
> how can i get the 0.75 from the left.
> Is anyone can help me ?
> thanks ahead!
> andrew


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message