Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 60A82173EF for ; Thu, 5 Mar 2015 13:46:43 +0000 (UTC) Received: (qmail 99132 invoked by uid 500); 5 Mar 2015 13:46:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 99067 invoked by uid 500); 5 Mar 2015 13:46:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 99055 invoked by uid 99); 5 Mar 2015 13:46:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2015 13:46:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hrdxwandg@gmail.com designates 209.85.220.46 as permitted sender) Received: from [209.85.220.46] (HELO mail-pa0-f46.google.com) (209.85.220.46) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2015 13:46:35 +0000 Received: by padfb1 with SMTP id fb1so33483443pad.7 for ; Thu, 05 Mar 2015 05:43:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=RM/V+ngboMcHactGgDx9aaanKC+O0B7ygtl2JEoWkPM=; b=EjrLcQG3PzHRtSeoj7ih0EM1KhPlAuKN+bpiAv1BCSiOtWTfUviub7ZaVwrdqsg8gE L1peFcre5Vmp5b9zq1tYUEMjZxp2gWLb7t9Dshtu024tDCOnus1KzIKR60e0kcylZpdi pN1u/l2QVSLu7fUQCz725yOhiGvqNhoPaKsOtRnsM+VRy0/ZOszQqrj2KffGLmAOt3bw 6dqRXZ7iAMPspZegkUwYFYTRncQOXFXy7TQvp+v7qYRIJdHACVLi7s3W5fxgF84EjKeS yVQfBPc3w41dZ3RzEWkv9J3EieYgLrniy3ToAqudnKWylstHz8Md2HJvL5HBWyLXZhOD hlJw== X-Received: by 10.68.189.167 with SMTP id gj7mr16209772pbc.58.1425563039836; Thu, 05 Mar 2015 05:43:59 -0800 (PST) Received: from [172.26.185.199] (ec2-54-65-46-201.ap-northeast-1.compute.amazonaws.com. [54.65.46.201]) by mx.google.com with ESMTPSA id fl4sm7115022pab.8.2015.03.05.05.43.57 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 05 Mar 2015 05:43:58 -0800 (PST) Message-ID: <54F85E55.60507@gmail.com> Date: Thu, 05 Mar 2015 21:47:01 +0800 From: wangdong User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: understanding the norm encode and decode References: <1893737227.2549586.1425477895288.JavaMail.yahoo@mail.yahoo.com> <54F7C346.6050208@gmail.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------090801090203060608080608" X-Virus-Checked: Checked by ClamAV on apache.org --------------090801090203060608080608 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit thank you for your detail answer.I get it As the document i have read is offical materials,I doubt it is correct. so i start a question. thank you again! andrew 在 2015/3/5 17:14, András Péteri 写道: > Sorry, I also got it wrong in the previous message. :) It goes 0.89f > -> 123 -> 0.875f. > > On Thu, Mar 5, 2015 at 10:08 AM, András Péteri > wrote: >> Hi Andrew, >> >> If you are using Lucene 3.6.1, you can take a look at the method which >> creates a single byte value out of the received float using bit >> manipulation at [1]. There is also a 256-element decoder table in >> Similarity, where each byte corresponds to a decoded float value >> computed by [2]. >> >> The first method encodes 0.89f to byte 123. 123 is decoded to 0.85f >> via the second method, so it seems that the documentation is incorrect >> in this regard. >> >> [1] https://github.com/apache/lucene-solr/blob/lucene_solr_3_6_1/lucene/core/src/java/org/apache/lucene/util/SmallFloat.java#L75 >> [2] https://github.com/apache/lucene-solr/blob/lucene_solr_3_6_1/lucene/core/src/java/org/apache/lucene/util/SmallFloat.java#L88 >> >> On Thu, Mar 5, 2015 at 3:45 AM, wangdong wrote: >>> thank you for your disscussion. >>> >>> I am a junior user of lucene, so i am not**familiar with some deep concept >>> you mentioned. >>> my question is simple. I just want to know how to get 0.75 from >>> decode(encode(0.89)) in offical document. >>> >>> why not 0.875? (0.875=0.5+0.25+0.125) >>> >>> thanks >>> andrew >>> >>> 在 2015/3/4 22:54, Adrien Grand 写道: >>>> Norms and doc values are indeed using the same API. However >>>> implementations differ a bit (eg. norms are stored in memory and use >>>> different compression schemes). >>>> >>>> The precision loss is up to the similarity. You could write a >>>> similarity impl which keeps full float precision, but scoring being >>>> fuzzy anyway this would multiply your memory needs for norms by 4 >>>> while not really improving the quality of the scores of your >>>> documents. This precision loss is the right trade-off for most >>>> use-cases. >>>> >>>> On Wed, Mar 4, 2015 at 3:04 PM, Ahmet Arslan >>>> wrote: >>>>> Hi Adrien, >>>>> >>>>> I read somewhere that norms are stored using docValues. >>>>> In my understanding, docvalues can store lossless float values. >>>>> So the question is, why are still several decode/encode methods exist in >>>>> similarity implementations? >>>>> Intuitively switching to docvalues for norms should prevent precision >>>>> loss thing. >>>>> >>>>> Ahmet >>>>> >>>>> >>>>> On Wednesday, March 4, 2015 3:22 PM, Adrien Grand >>>>> wrote: >>>>> Hi, >>>>> >>>>> Floats require 32 bits but norms are encoded on a single byte. So >>>>> there is a precision loss when encoding float values into a single >>>>> byte. In your example, 0.75 and 0.89 are sufficiently close to each >>>>> other so that they are encoded to the same byte. >>>>> >>>>> >>>>> On Wed, Mar 4, 2015 at 4:48 AM, wangdong wrote: >>>>>> I read the article about the scoring section in lucene as follows: >>>>>> >>>>>> Encoding and decoding of the resulted float norm in a single byte are >>>>>> done >>>>>> by the static methods of the class Similarity:encodeNorm() >>>>>> >>>>>> anddecodeNorm() >>>>>> >>>>>> . >>>>>> Due to loss of precision, it is not guaranteed that decode(encode(x)) = >>>>>> x, >>>>>> e.g. decode(encode(0.89)) = 0.75. At scoring (search) time, this norm is >>>>>> brought into the score of document as*norm(t, d)*, as shown by the >>>>>> formula >>>>>> inSimilarity >>>>>> >>>>>> . >>>>>> >>>>>> I can not understand the formula decode(encode(0.89)) = 0.75 >>>>>> how can i get the 0.75 from the left. >>>>>> >>>>>> Is anyone can help me ? >>>>>> thanks ahead! >>>>>> >>>>>> andrew >>>>> >>>>> >>>>> -- >>>>> Adrien >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>> >> -- >> András > > --------------090801090203060608080608--