Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 70358 invoked from network); 28 Dec 2004 10:07:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 28 Dec 2004 10:07:06 -0000 Received: (qmail 70963 invoked by uid 500); 28 Dec 2004 10:06:02 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 70860 invoked by uid 500); 28 Dec 2004 10:06:01 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 70742 invoked by uid 99); 28 Dec 2004 10:05:59 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from smtp-vbr10.xs4all.nl (HELO smtp-vbr10.xs4all.nl) (194.109.24.30) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 28 Dec 2004 02:05:56 -0800 Received: from k8l.lan (porta.xs4all.nl [80.127.24.69]) by smtp-vbr10.xs4all.nl (8.12.11/8.12.11) with ESMTP id iBSA5pGk042840 for ; Tue, 28 Dec 2004 11:05:51 +0100 (CET) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: lucene-user@jakarta.apache.org Subject: Re: document boost not showing up in Explanation Date: Tue, 28 Dec 2004 11:05:51 +0100 User-Agent: KMail/1.5.4 References: <4B076A84-58A3-11D9-8D3B-000A95BC61B6@ehatchersolutions.com> In-Reply-To: <4B076A84-58A3-11D9-8D3B-000A95BC61B6@ehatchersolutions.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200412281105.51202.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Tuesday 28 December 2004 08:37, Erik Hatcher wrote: > > On Dec 27, 2004, at 9:54 PM, Vikas Gupta wrote: > > I am using lucene-1.4.1.jar(with nutch). For some reason, the effect of > > document boost is not showing up in the search results. Also, why is it > > not a part of the Explanation > > It actually is part of it.... > > > Below is the 'explanation' of a sample query "solar". I don't see > > the > > boost value (1.5514448) being used at all in the calculation of the > > document score - from the 'explanation' below and also from the > > quality of > > the search. > > > > How can I see the effect of document boost? > > Document boost is not stored in the index as-is. A single > normalization factor is stored per-field and is computed at indexing > type using field and document boosts, as well as the length > normalization factor (and perhaps other factors I'm forgetting at the > moment?). This also means that the explanation can only show a field normalisation factor as it is available from the index. One reason that boosting does necessarily not show up in the quality of the search is that the byte encoding allows only 256 different values to be stored. The value stored in the index (called the norm) is the product of the document boost factor, the field boost factor and the lengthNorm() of the field. For the search results to actually change because of the boost factors, it is necessary that this stored factor is changed to another one of the 256 possible. The range of possible values stored in the index is roughly from 7x10^9 to 2x10^-9 . See: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/document/Field.html#setBoost(float) and http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#encodeNorm(float) The range of stored values (excluding the zero special case) is about 7x10^9 / 2x10^-9 = 3.5x10^18. The 10 log of that is about 18.5 . Per factor 10 there are about 255/18.5 = 13.8 encoded values. So, a minimum boost factor that should change a document score is about log(13.8)/log(10) = 1.14 . Since the default lengthNorm is the square root, a field length should change by at least the square of that (roughly a factor 1.3) to change the document score (assuming no hits in the changed field text.) Finally, a change in document score only influences the document ordering in the search results when another document has a score that is within the range of the change. Regards, Paul Elschot. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org