lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donna L Gresh <gr...@us.ibm.com>
Subject Re: Lucene Search result (scoring )
Date Fri, 15 Jun 2007 17:09:28 GMT
Your examples are a little confusing to read. However, I think one thing 
that you need to know is that the score (by "default") depends on more 
than just the number of hits. It also depends on the length of the 
document the hits are in. For example, matching two words in a 
two-word-long document will generate a higher score than matching the same 
two words in a one-hundred-word-long document. In order to change the 
scoring, you will have to override some of the standard methods. There are 
those on the mailing list much more qualified to discuss this than I am, 
as I am new myself.

Donna Gresh




"Yatin Soni" <yatinsoni@interinfosystems.com> 
06/14/2007 12:53 PM
Please respond to
java-user@lucene.apache.org


To
<java-user@lucene.apache.org>
cc

Subject
Lucene Search result (scoring )






Hi,

    We are using Lucene as search engine and I have a question regarding 
the scoring of search results, I had given a example for it,

Example :-->

suppose we have four Items on which we have indexed,
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
1)demo --> contents: Jumbo Numbers
2)demo2 --> contents: Jumbo Numbers
3)demo3 --> contents: Jumbo Numbers
4)demo4 --> contents: Jumbo Numbers

Search Results for query ->"Jumbo"

demo:  Score: 0.3884282
demo2: Score: 0.3884282
demo3: Score: 0.3884282
demo4: Score: 0.3884282
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Test 1 >>  Change the contents of items

1)demo --> contents: Jumbo Numbers Stamps Numbers
2)demo2 --> contents: Jumbo Numbers Stamps Jumbo
3)demo3 --> contents: Jumbo Numbers Numbers Test
4)demo4 --> contents: Jumbo Numbers Stamps Jumbo

Search Results for query ->"Jumbo"

demo2: Score: 0.54932046
demo4: Score: 0.54932046
demo:  Score: 0.3884282
demo3: Score: 0.3884282

# In the Test 1 items(demo2 and demo4) both contains 2 occurrence of 
"Jumbo" and rest items(demo and demo3) 
  have only one occurrence of "Jumbo" so result are coming as per the high 
score.
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Test 2 >>  Change the contents of items

1)demo --> contents: Jumbo Numbers Stamps Numbers
2)demo2 --> contents: Jumbo Numbers Stamps Jumbo Parquetry Block Super Set
3)demo3 --> contents: Jumbo Numbers Numbers Test
4)demo4 --> contents: Jumbo Numbers Stamps Jumbo

Search Results for query ->"Jumbo"

demo4: Score: 0.5493204
demo:  Score: 0.3884282
demo3: Score: 0.3884282
demo2: Score: 0.3433253

# Now in the Test 2 We are changing the contents of demo2, and by 
searching the same query "Jumbo" the results
  are different from the Test 1.And the score for demo2 is also less as 
compare to the items (demo and demo3) 
  which have less occurrence of "Jumbo" as compare to demo2.
////////////////////////////////////////////////////////////////////////////////////////////////////////////////

According to the example search results are coming in the following 
sequence:---->
1. demo4: Score: 0.5493204
2. demo:  Score: 0.3884282
3. demo3: Score: 0.3884282
4. demo2: Score: 0.3433253

but we are expecting in the following sequence :--->
1. demo4
2. demo2
3. demo
4. demo3

OR

1. demo2
2. demo4
3. demo
4. demo3

because demo2 contains the two occurrence of "Jumbo" which is higher than 
the items demo and demo3. I have sorted the results with their RELEVANCE 
but then also
results were coming in the same sequence.

So my QUESTION is that can we make desire sequence as per the occurrence 
of a particular word? Can we make demo2 above the demo and demo3 in the 
search results?

So I need help on this issue....

Thanks,
Yatin




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message