From java-user-return-24987-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Wed Dec 13 05:14:11 2006 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 60133 invoked from network); 13 Dec 2006 05:14:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Dec 2006 05:14:10 -0000 Received: (qmail 19217 invoked by uid 500); 13 Dec 2006 05:14:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19119 invoked by uid 500); 13 Dec 2006 05:14:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19108 invoked by uid 99); 13 Dec 2006 05:14:10 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Dec 2006 21:14:10 -0800 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of DORONC@il.ibm.com designates 195.212.29.151 as permitted sender) Received: from [195.212.29.151] (HELO mtagate2.de.ibm.com) (195.212.29.151) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Dec 2006 21:13:58 -0800 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate2.de.ibm.com (8.13.8/8.13.8) with ESMTP id kBD5DaWW178170 for ; Wed, 13 Dec 2006 05:13:36 GMT Received: from d12av02.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12nrmr1607.megacenter.de.ibm.com (8.13.6/8.13.6/NCO v8.1.1) with ESMTP id kBD5Dak63252414 for ; Wed, 13 Dec 2006 06:13:36 +0100 Received: from d12av02.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id kBD5DaSq021769 for ; Wed, 13 Dec 2006 06:13:36 +0100 Received: from d12mc102.megacenter.de.ibm.com (d12mc102.megacenter.de.ibm.com [9.149.167.114]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id kBD5DZvC021762 for ; Wed, 13 Dec 2006 06:13:36 +0100 In-Reply-To: <20061212085928.212210@gmx.net> Subject: Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed) To: java-user@lucene.apache.org X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006 Message-ID: From: Doron Cohen Date: Tue, 12 Dec 2006 21:10:43 -0800 X-MIMETrack: Serialize by Router on D12MC102/12/M/IBM(Release 7.0.2HF71 | November 3, 2006) at 13/12/2006 07:13:35 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org "Karl Koch" wrote: > For the documents Lucene employs > its norm_d_t which is explained as: > > norm_d_t : square root of number of tokens in d in the same field as t Actually (by default) it is: 1 / sqrt(#tokens in d with same field as t) > basically just the square root of the number of unique terms in the > document (since I do search over all fields always). I would have > expected cosine normalisation here... > > The paper you provided uses document normalisation in the following way: > > norm = 1 / sqrt(0.8*avgDocLength + 0.2*(# of unique terms in d)) > > I am not sure how this relates to norm_d_t. That system is less "field oriented" than Lucene, so you could say the normalization there goes over all the fields. The {0.8,0.2} args are parametric and control how aggressive this normalization is. If you used there {0,1} you would get 1 / sqrt(#unique terms in d) and that would be similar to Lucene's 1 / sqrt(#tokens in d with same field as t) however (in that system) that would have punish long documents too much and would too much boost up stupid dummy short documents, and that's why the {0.8,0.2} were introduced there. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org