Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 20210 invoked from network); 15 Jun 2007 17:09:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jun 2007 17:09:55 -0000 Received: (qmail 86577 invoked by uid 500); 15 Jun 2007 17:09:52 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 86548 invoked by uid 500); 15 Jun 2007 17:09:52 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 86537 invoked by uid 99); 15 Jun 2007 17:09:52 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2007 10:09:52 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of gresh@us.ibm.com designates 32.97.182.145 as permitted sender) Received: from [32.97.182.145] (HELO e5.ny.us.ibm.com) (32.97.182.145) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2007 10:09:48 -0700 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e5.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l5FH9QND006950 for ; Fri, 15 Jun 2007 13:09:26 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.3) with ESMTP id l5FH9QXQ388944 for ; Fri, 15 Jun 2007 13:09:26 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l5FH9QGa013446 for ; Fri, 15 Jun 2007 13:09:26 -0400 Received: from d01ml605.pok.ibm.com (d01ml605.pok.ibm.com [9.56.227.91]) by d01av02.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id l5FH9QCf013443 for ; Fri, 15 Jun 2007 13:09:26 -0400 In-Reply-To: <003a01c7aea4$8d5b9f10$d100a8c0@Yatin> To: java-user@lucene.apache.org MIME-Version: 1.0 Subject: Re: Lucene Search result (scoring ) X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006 From: Donna L Gresh Message-ID: Date: Fri, 15 Jun 2007 13:09:28 -0400 X-MIMETrack: Serialize by Router on D01ML605/01/M/IBM(Build V80_M5_05202007|May 20, 2007) at 06/15/2007 13:09:29, Serialize complete at 06/15/2007 13:09:29 Content-Type: multipart/alternative; boundary="=_alternative 005E3EDE852572FB_=" X-Virus-Checked: Checked by ClamAV on apache.org --=_alternative 005E3EDE852572FB_= Content-Type: text/plain; charset="US-ASCII" Your examples are a little confusing to read. However, I think one thing that you need to know is that the score (by "default") depends on more than just the number of hits. It also depends on the length of the document the hits are in. For example, matching two words in a two-word-long document will generate a higher score than matching the same two words in a one-hundred-word-long document. In order to change the scoring, you will have to override some of the standard methods. There are those on the mailing list much more qualified to discuss this than I am, as I am new myself. Donna Gresh "Yatin Soni" 06/14/2007 12:53 PM Please respond to java-user@lucene.apache.org To cc Subject Lucene Search result (scoring ) Hi, We are using Lucene as search engine and I have a question regarding the scoring of search results, I had given a example for it, Example :--> suppose we have four Items on which we have indexed, //////////////////////////////////////////////////////////////////////////////////////////////////////////////// 1)demo --> contents: Jumbo Numbers 2)demo2 --> contents: Jumbo Numbers 3)demo3 --> contents: Jumbo Numbers 4)demo4 --> contents: Jumbo Numbers Search Results for query ->"Jumbo" demo: Score: 0.3884282 demo2: Score: 0.3884282 demo3: Score: 0.3884282 demo4: Score: 0.3884282 //////////////////////////////////////////////////////////////////////////////////////////////////////////////// Test 1 >> Change the contents of items 1)demo --> contents: Jumbo Numbers Stamps Numbers 2)demo2 --> contents: Jumbo Numbers Stamps Jumbo 3)demo3 --> contents: Jumbo Numbers Numbers Test 4)demo4 --> contents: Jumbo Numbers Stamps Jumbo Search Results for query ->"Jumbo" demo2: Score: 0.54932046 demo4: Score: 0.54932046 demo: Score: 0.3884282 demo3: Score: 0.3884282 # In the Test 1 items(demo2 and demo4) both contains 2 occurrence of "Jumbo" and rest items(demo and demo3) have only one occurrence of "Jumbo" so result are coming as per the high score. //////////////////////////////////////////////////////////////////////////////////////////////////////////////// Test 2 >> Change the contents of items 1)demo --> contents: Jumbo Numbers Stamps Numbers 2)demo2 --> contents: Jumbo Numbers Stamps Jumbo Parquetry Block Super Set 3)demo3 --> contents: Jumbo Numbers Numbers Test 4)demo4 --> contents: Jumbo Numbers Stamps Jumbo Search Results for query ->"Jumbo" demo4: Score: 0.5493204 demo: Score: 0.3884282 demo3: Score: 0.3884282 demo2: Score: 0.3433253 # Now in the Test 2 We are changing the contents of demo2, and by searching the same query "Jumbo" the results are different from the Test 1.And the score for demo2 is also less as compare to the items (demo and demo3) which have less occurrence of "Jumbo" as compare to demo2. //////////////////////////////////////////////////////////////////////////////////////////////////////////////// According to the example search results are coming in the following sequence:----> 1. demo4: Score: 0.5493204 2. demo: Score: 0.3884282 3. demo3: Score: 0.3884282 4. demo2: Score: 0.3433253 but we are expecting in the following sequence :---> 1. demo4 2. demo2 3. demo 4. demo3 OR 1. demo2 2. demo4 3. demo 4. demo3 because demo2 contains the two occurrence of "Jumbo" which is higher than the items demo and demo3. I have sorted the results with their RELEVANCE but then also results were coming in the same sequence. So my QUESTION is that can we make desire sequence as per the occurrence of a particular word? Can we make demo2 above the demo and demo3 in the search results? So I need help on this issue.... Thanks, Yatin --=_alternative 005E3EDE852572FB_=--