Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 17198 invoked from network); 4 May 2006 21:45:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 4 May 2006 21:45:14 -0000 Received: (qmail 19962 invoked by uid 500); 4 May 2006 21:45:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19847 invoked by uid 500); 4 May 2006 21:45:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19751 invoked by uid 99); 4 May 2006 21:45:01 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 May 2006 14:45:01 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [169.229.70.167] (HELO rescomp.berkeley.edu) (169.229.70.167) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 May 2006 14:45:00 -0700 Received: by rescomp.berkeley.edu (Postfix, from userid 1007) id 6069A5B764; Thu, 4 May 2006 14:44:40 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by rescomp.berkeley.edu (Postfix) with ESMTP id 46DA67F403 for ; Thu, 4 May 2006 14:44:40 -0700 (PDT) Date: Thu, 4 May 2006 14:44:40 -0700 (PDT) From: Chris Hostetter To: java-user@lucene.apache.org Subject: Re: Newbie questions re: scoring In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N : 1) I create an index with one document with a searchable field of "All : dogs are brown." If I search on that index with a query of "All dogs : are brown." I do not get a hit with score 1.0, but something low like : 0.38. I tried looking at the scoring algorithm and can't make heads or : tails of it. Can anybody explain it to me in simple terms? I've been using Lucene for about 16 months now, and i've never found a simple way to explain the scoring. But a big factor that you need to realize is there is a differnece between the "raw" score and the normalized score. if you use a HitCollector or TopDocs object you get the raw scored -- which is uncosntrained. if you use a Hits object then your scores will be normalized so that *if* the highest scoring document has a score above 1, then all scores will be divided by the highest score -- if the highest score is less then one, nothing changes. my best advice for understainding how scores are calculated, is to look at the toString() of an Explanation object from searcher.explain() for a bunch of queries on a bunch of documens you know match, and think about how those explanations corrispond to the equation in the Similarity class javadocs. : 2) I have an index of documents, then run a search against it. I run : through the list of hits, building a Vector of documents whose score is : above a certain threshold. If I run the program with a threshold of : say, 0.15, I'll get a Vector of documents with scores >= 0.15 (as : expected). If I set the threshold higher (0.30, for example) and rerun : the program, I see some of the same documents that I thought would have : been trimmed off with the higher threshold. With a threshold of 0.15 : they would score 0.17, and with a threshold of 0.30 they are scoring : something like 0.33. Can anybody explain this? My trimming is coming : post-index-searching, so this is pretty confusing. you are doing this with the exact same index and Query each time? 1) that shouldn't happen .. can you email some code that demonstates this problem (ideally code that builds a small index and then searches it and shows the same document getting two different scores without the index changing) 2) independent of the scores being different, it is not safe to try and pick a score threshold, this is mentioned in the FAQ... http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03 -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org