lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: understanding the norm encode and decode
Date Wed, 04 Mar 2015 14:54:13 GMT
Norms and doc values are indeed using the same API. However
implementations differ a bit (eg. norms are stored in memory and use
different compression schemes).

The precision loss is up to the similarity. You could write a
similarity impl which keeps full float precision, but scoring being
fuzzy anyway this would multiply your memory needs for norms by 4
while not really improving the quality of the scores of your
documents. This precision loss is the right trade-off for most
use-cases.

On Wed, Mar 4, 2015 at 3:04 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid> wrote:
> Hi Adrien,
>
> I read somewhere that norms are stored using docValues.
> In my understanding, docvalues can store lossless float values.
> So the question is, why are still several decode/encode methods exist in similarity implementations?
> Intuitively switching to docvalues for norms should prevent precision loss thing.
>
> Ahmet
>
>
> On Wednesday, March 4, 2015 3:22 PM, Adrien Grand <jpountz@gmail.com> wrote:
> Hi,
>
> Floats require 32 bits but norms are encoded on a single byte. So
> there is a precision loss when encoding float values into a single
> byte. In your example, 0.75 and 0.89 are sufficiently close to each
> other so that they are encoded to the same byte.
>
>
> On Wed, Mar 4, 2015 at 4:48 AM, wangdong <hrdxwandg@gmail.com> wrote:
>> I read the article about the scoring section in lucene as follows:
>>
>> Encoding and decoding of the resulted float norm in a single byte are done
>> by the static methods of the class Similarity:encodeNorm()
>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html#encodeNorm%28float%29>anddecodeNorm()
>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html#decodeNorm%28byte%29>.
>> Due to loss of precision, it is not guaranteed that decode(encode(x)) = x,
>> e.g. decode(encode(0.89)) = 0.75. At scoring (search) time, this norm is
>> brought into the score of document as*norm(t, d)*, as shown by the formula
>> inSimilarity
>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html>.
>>
>> I can not understand the formula decode(encode(0.89)) = 0.75
>> how can i get the 0.75 from the left.
>>
>> Is anyone can help me ?
>> thanks ahead!
>>
>> andrew
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message