From lucene-user-return-11929-apmail-jakarta-lucene-user-archive=jakarta.apache.org@jakarta.apache.org Wed Dec 15 20:16:59 2004 Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 32233 invoked from network); 15 Dec 2004 20:16:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 15 Dec 2004 20:16:59 -0000 Received: (qmail 82012 invoked by uid 500); 15 Dec 2004 20:16:53 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 81974 invoked by uid 500); 15 Dec 2004 20:16:53 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 81948 invoked by uid 99); 15 Dec 2004 20:16:53 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from keyserver.Rescomp.Berkeley.EDU (HELO rescomp.berkeley.edu) (169.229.70.167) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 15 Dec 2004 12:15:01 -0800 Received: by rescomp.berkeley.edu (Postfix, from userid 1007) id 3F52F5B7F2; Wed, 15 Dec 2004 12:14:16 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by rescomp.berkeley.edu (Postfix) with ESMTP id 32E3B7F45E for ; Wed, 15 Dec 2004 12:14:16 -0800 (PST) Date: Wed, 15 Dec 2004 12:14:16 -0800 (PST) From: Chris Hostetter Sender: hossman@hal.rescomp.berkeley.edu To: Lucene Users List Subject: Re: A question about scoring function in Lucene In-Reply-To: <41C08C85.8090201@apache.org> Message-ID: References: <41C08C85.8090201@apache.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N : I question whether such scores are more meaningful. Yes, such scores : would be guaranteed to be between zero and one, but would 0.8 really be : meaningful? I don't think so. Do you have pointers to research which : demonstrates this? E.g., when such a scoring method is used, that : thresholding by score is useful across queries? I freely admit that I'm way out of my league on these scoring discussions, but I believe what the OP was refering to was not any intrinsic benefit in having a score between 0 and 1, but of having a uniform normalization of scores regardless of search terms. For example, using the current scoring equation, if i do a search for "Doug Cutting" and the results/scores i get back are... 1: 0.9 2: 0.3 3: 0.21 4: 0.21 5: 0.1 ...then there are at least two meaningful pieces of data I can glean: a) document #1 is significantly better then the other results b) document #3 and #4 are both equaly relevant to "Doug Cutting" If I then do a search for "Chris Hostetter" and get back the following results/scores... 9: 0.9 8: 0.3 7: 0.21 6: 0.21 5: 0.1 ...then I can assume the same corrisponding information is true about my new search term (#9 is significantly better, and #7/#8 are equally as good) However, I *cannot* say either of the following: x) document #9 is as relevant for "Chris Hostetter" as document #1 is relevant to "Doug Cutting" y) document #5 is equally relevant to both "Chris Hostetter" and "Doug Cutting" I think the OP is arguing that if the scoring algorithm was modified in the way they suggested, then you would be able to make statements x & y. If they are correct, then I for one can see a definite benefit in that. If for no other reason then in making minimum score thresholds more meaningful. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org