Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 41680 invoked from network); 30 Nov 2004 16:19:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 30 Nov 2004 16:19:18 -0000 Received: (qmail 33179 invoked by uid 500); 30 Nov 2004 16:17:49 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 33152 invoked by uid 500); 30 Nov 2004 16:17:49 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 33136 invoked by uid 99); 30 Nov 2004 16:17:48 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from ares.cs.Virginia.EDU (HELO ares.cs.Virginia.EDU) (128.143.137.19) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 30 Nov 2004 08:17:47 -0800 Received: from cobra.cs.Virginia.EDU (cobra.cs.Virginia.EDU [128.143.137.16]) by ares.cs.Virginia.EDU (8.12.10/8.12.10/UVACS-2003031900) with ESMTP id iAUGHdM0006488 for ; Tue, 30 Nov 2004 11:17:39 -0500 (EST) Received: from localhost (xj3a@localhost) by cobra.cs.Virginia.EDU (8.12.10+Sun/8.12.10/Submit) with ESMTP id iAUGHcL9010336 for ; Tue, 30 Nov 2004 11:17:39 -0500 (EST) Date: Tue, 30 Nov 2004 11:17:38 -0500 (EST) From: Xiangyu Jin To: lucene-user@jakarta.apache.org Subject: Lucene's ranking function VS Standard VSM model Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I have seen different versions of Lucene's ranking function from the similarity document and Lucene user list. Since I need to get document-doucment similaries, so what I do is to issue the document as query directly. I found it is different if we issue "computer computer" to Lucene vers we issue it to standard VSM. The latter one will treat "computer computer" as "computer" but Lucene doesn't. In order to illustrate my question more clear, I write a more formalized document http://www.cs.virginia.edu/~xj3a/lucene_ranking.pdf so that there is no ambiguity of those formulas. I am not asure whether I understand correctly, but the major reason comes from Lucene's query parser. It defaults each term appear once. If we issue a query term multiple times in the query string, it will result in some un-expected results. For detail information, pls refer to the attached link. thanks xiangyu jin --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org