lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: understanding the norm encode and decode
Date Thu, 05 Mar 2015 14:15:21 GMT


Hi András,

Thats a good catch! Do you want to correct that javadoc mistake and create a patch?
https://wiki.apache.org/lucene-java/HowToContribute

If you don't have a jira account, anyone can create it.
https://issues.apache.org/jira/browse/lucene

Ahmet


On Thursday, March 5, 2015 11:15 AM, András Péteri <apeteri@b2international.com> wrote:
Sorry, I also got it wrong in the previous message. :) It goes 0.89f
-> 123 -> 0.875f.

On Thu, Mar 5, 2015 at 10:08 AM, András Péteri
<apeteri@b2international.com> wrote:
> Hi Andrew,
>
> If you are using Lucene 3.6.1, you can take a look at the method which
> creates a single byte value out of the received float using bit
> manipulation at [1]. There is also a 256-element decoder table in
> Similarity, where each byte corresponds to a decoded float value
> computed by [2].
>
> The first method encodes 0.89f to byte 123. 123 is decoded to 0.85f
> via the second method, so it seems that the documentation is incorrect
> in this regard.
>
> [1] https://github.com/apache/lucene-solr/blob/lucene_solr_3_6_1/lucene/core/src/java/org/apache/lucene/util/SmallFloat.java#L75
> [2] https://github.com/apache/lucene-solr/blob/lucene_solr_3_6_1/lucene/core/src/java/org/apache/lucene/util/SmallFloat.java#L88
>
> On Thu, Mar 5, 2015 at 3:45 AM, wangdong <hrdxwandg@gmail.com> wrote:
>> thank you for your disscussion.
>>
>> I am a junior user of lucene, so i am not**familiar with some deep concept
>> you mentioned.
>> my question is simple. I just want to know how to get 0.75 from
>> decode(encode(0.89)) in offical document.
>>
>> why not 0.875?   (0.875=0.5+0.25+0.125)
>>
>> thanks
>> andrew
>>
>> 在 2015/3/4 22:54, Adrien Grand 写道:
>>>
>>> Norms and doc values are indeed using the same API. However
>>> implementations differ a bit (eg. norms are stored in memory and use
>>> different compression schemes).
>>>
>>> The precision loss is up to the similarity. You could write a
>>> similarity impl which keeps full float precision, but scoring being
>>> fuzzy anyway this would multiply your memory needs for norms by 4
>>> while not really improving the quality of the scores of your
>>> documents. This precision loss is the right trade-off for most
>>> use-cases.
>>>
>>> On Wed, Mar 4, 2015 at 3:04 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
>>> wrote:
>>>>
>>>> Hi Adrien,
>>>>
>>>> I read somewhere that norms are stored using docValues.
>>>> In my understanding, docvalues can store lossless float values.
>>>> So the question is, why are still several decode/encode methods exist in
>>>> similarity implementations?
>>>> Intuitively switching to docvalues for norms should prevent precision
>>>> loss thing.
>>>>
>>>> Ahmet
>>>>
>>>>
>>>> On Wednesday, March 4, 2015 3:22 PM, Adrien Grand <jpountz@gmail.com>
>>>> wrote:
>>>> Hi,
>>>>
>>>> Floats require 32 bits but norms are encoded on a single byte. So
>>>> there is a precision loss when encoding float values into a single
>>>> byte. In your example, 0.75 and 0.89 are sufficiently close to each
>>>> other so that they are encoded to the same byte.
>>>>
>>>>
>>>> On Wed, Mar 4, 2015 at 4:48 AM, wangdong <hrdxwandg@gmail.com> wrote:
>>>>>
>>>>> I read the article about the scoring section in lucene as follows:
>>>>>
>>>>> Encoding and decoding of the resulted float norm in a single byte are
>>>>> done
>>>>> by the static methods of the class Similarity:encodeNorm()
>>>>>
>>>>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html#encodeNorm%28float%29>anddecodeNorm()
>>>>>
>>>>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html#decodeNorm%28byte%29>.
>>>>> Due to loss of precision, it is not guaranteed that decode(encode(x))
=
>>>>> x,
>>>>> e.g. decode(encode(0.89)) = 0.75. At scoring (search) time, this norm
is
>>>>> brought into the score of document as*norm(t, d)*, as shown by the
>>>>> formula
>>>>> inSimilarity
>>>>>
>>>>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html>.
>>>>>
>>>>> I can not understand the formula decode(encode(0.89)) = 0.75
>>>>> how can i get the 0.75 from the left.
>>>>>
>>>>> Is anyone can help me ?
>>>>> thanks ahead!
>>>>>
>>>>> andrew
>>>>
>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>
>
> --
> András



-- 
Péteri András


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message