Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 58295 invoked from network); 30 Nov 2004 18:27:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 30 Nov 2004 18:27:41 -0000 Received: (qmail 66010 invoked by uid 500); 30 Nov 2004 18:23:54 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 65979 invoked by uid 500); 30 Nov 2004 18:23:53 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 65946 invoked by uid 99); 30 Nov 2004 18:23:53 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from smtp-vbr9.xs4all.nl (HELO smtp-vbr9.xs4all.nl) (194.109.24.29) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 30 Nov 2004 10:23:37 -0800 Received: from k8l.lan (porta.xs4all.nl [80.127.24.69]) by smtp-vbr9.xs4all.nl (8.12.11/8.12.11) with ESMTP id iAUINQb9075665 for ; Tue, 30 Nov 2004 19:23:26 +0100 (CET) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: lucene-user@jakarta.apache.org Subject: Re: Does Lucene perform ranking in the retrieved set? Date: Tue, 30 Nov 2004 19:23:26 +0100 User-Agent: KMail/1.5.4 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200411301923.26135.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Tuesday 30 November 2004 18:46, Xiangyu Jin wrote: > > THis might be a stupid question. > > When perform retrieval for a query, deos Lucene first get > a subset of candidate matches and then perform the ranking > on the set? That is, similarity calculation is performed only > on a subset of the docuemnts to the query. Yes, Lucene uses an inverted index for this. > If so, from which module could I get those candidate docs, > then I can perform my own similarity calculations (since > I might need to rewrite the normalization factor, so > only modify the "similarity" model seems will not > work). To change the normalisation you may consider implementing your own Weight: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Weight.html For some example implementations of Weight the Lucene source code in the org.apache.lucene.search package is the best resource. Using your own Weight also requires a subclass of Query that returns this weight in the createWeight() method. > Or, is there document describe the produre of how Lucene > perform search? This describes the scoring: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html See also the DefaultSimilarity. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org