Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of hrdxwandg@gmail.com
 designates 209.85.220.46 as permitted sender)
Message-ID: <54F85E55.60507@gmail.com>
Date: Thu, 05 Mar 2015 21:47:01 +0800
From: wangdong <hrdxwandg@gmail.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: java-user@lucene.apache.org
Subject: Re: understanding the norm encode and decode
References: 
 <CAPsWd+O5Dhix_2-ydNYXSyeik0yzjmUFhaFTDUkSOJO2_7iczA@mail.gmail.com>
 <1893737227.2549586.1425477895288.JavaMail.yahoo@mail.yahoo.com>
 <CAPsWd+OeSY-vt8rkyh88=-KG1wJE=C1A5gAZdmO9HXnBa6dFpw@mail.gmail.com>
 <54F7C346.6050208@gmail.com>
 <CAO=0Lobpwf1LJ9zOVaCjT_fv4fHZgpXDuaeeFV-V3GKMtjR3Dw@mail.gmail.com>
 <CAO=0Lob41upAfeZcmT5AAepR7bqPYHgUr_ZsAz9AzURV5O6Zeg@mail.gmail.com>
In-Reply-To: 
 <CAO=0Lob41upAfeZcmT5AAepR7bqPYHgUr_ZsAz9AzURV5O6Zeg@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------090801090203060608080608"

--------------090801090203060608080608
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

thank you for your detail answer.I get it
As the document i have read is offical materials,I doubt it is correct. 
so i start  a question.

thank you again!

andrew

在 2015/3/5 17:14, András Péteri 写道:
> Sorry, I also got it wrong in the previous message. :) It goes 0.89f
> -> 123 -> 0.875f.
>
> On Thu, Mar 5, 2015 at 10:08 AM, András Péteri
> <apeteri@b2international.com> wrote:
>> Hi Andrew,
>>
>> If you are using Lucene 3.6.1, you can take a look at the method which
>> creates a single byte value out of the received float using bit
>> manipulation at [1]. There is also a 256-element decoder table in
>> Similarity, where each byte corresponds to a decoded float value
>> computed by [2].
>>
>> The first method encodes 0.89f to byte 123. 123 is decoded to 0.85f
>> via the second method, so it seems that the documentation is incorrect
>> in this regard.
>>
>> [1] https://github.com/apache/lucene-solr/blob/lucene_solr_3_6_1/lucene/core/src/java/org/apache/lucene/util/SmallFloat.java#L75
>> [2] https://github.com/apache/lucene-solr/blob/lucene_solr_3_6_1/lucene/core/src/java/org/apache/lucene/util/SmallFloat.java#L88
>>
>> On Thu, Mar 5, 2015 at 3:45 AM, wangdong <hrdxwandg@gmail.com> wrote:
>>> thank you for your disscussion.
>>>
>>> I am a junior user of lucene, so i am not**familiar with some deep concept
>>> you mentioned.
>>> my question is simple. I just want to know how to get 0.75 from
>>> decode(encode(0.89)) in offical document.
>>>
>>> why not 0.875?   (0.875=0.5+0.25+0.125)
>>>
>>> thanks
>>> andrew
>>>
>>> 在 2015/3/4 22:54, Adrien Grand 写道:
>>>> Norms and doc values are indeed using the same API. However
>>>> implementations differ a bit (eg. norms are stored in memory and use
>>>> different compression schemes).
>>>>
>>>> The precision loss is up to the similarity. You could write a
>>>> similarity impl which keeps full float precision, but scoring being
>>>> fuzzy anyway this would multiply your memory needs for norms by 4
>>>> while not really improving the quality of the scores of your
>>>> documents. This precision loss is the right trade-off for most
>>>> use-cases.
>>>>
>>>> On Wed, Mar 4, 2015 at 3:04 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
>>>> wrote:
>>>>> Hi Adrien,
>>>>>
>>>>> I read somewhere that norms are stored using docValues.
>>>>> In my understanding, docvalues can store lossless float values.
>>>>> So the question is, why are still several decode/encode methods exist in
>>>>> similarity implementations?
>>>>> Intuitively switching to docvalues for norms should prevent precision
>>>>> loss thing.
>>>>>
>>>>> Ahmet
>>>>>
>>>>>
>>>>> On Wednesday, March 4, 2015 3:22 PM, Adrien Grand <jpountz@gmail.com>
>>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> Floats require 32 bits but norms are encoded on a single byte. So
>>>>> there is a precision loss when encoding float values into a single
>>>>> byte. In your example, 0.75 and 0.89 are sufficiently close to each
>>>>> other so that they are encoded to the same byte.
>>>>>
>>>>>
>>>>> On Wed, Mar 4, 2015 at 4:48 AM, wangdong <hrdxwandg@gmail.com> wrote:
>>>>>> I read the article about the scoring section in lucene as follows:
>>>>>>
>>>>>> Encoding and decoding of the resulted float norm in a single byte are
>>>>>> done
>>>>>> by the static methods of the class Similarity:encodeNorm()
>>>>>>
>>>>>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html#encodeNorm%28float%29>anddecodeNorm()
>>>>>>
>>>>>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html#decodeNorm%28byte%29>.
>>>>>> Due to loss of precision, it is not guaranteed that decode(encode(x)) =
>>>>>> x,
>>>>>> e.g. decode(encode(0.89)) = 0.75. At scoring (search) time, this norm is
>>>>>> brought into the score of document as*norm(t, d)*, as shown by the
>>>>>> formula
>>>>>> inSimilarity
>>>>>>
>>>>>> <http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/search/Similarity.html>.
>>>>>>
>>>>>> I can not understand the formula decode(encode(0.89)) = 0.75
>>>>>> how can i get the 0.75 from the left.
>>>>>>
>>>>>> Is anyone can help me ?
>>>>>> thanks ahead!
>>>>>>
>>>>>> andrew
>>>>>
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>> --
>> András
>
>


--------------090801090203060608080608--