lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From adakos <jblo...@recruititassociates.com>
Subject Scoring modification question
Date Wed, 21 Jan 2009 10:09:30 GMT

Hello again!

Just a quick question.

At present I am unable to alter the way lucene scores the documents (My
knowledge is fairly limited in how I can go about doing this).

I understand that documents are scored based on the number of hits, and that
value is modified by the number of words in the document.

So for example a document containing the word dog 5 times in a document
containing 10 words will be scored higher than a document containing the
word dog 50 times in a document containing 120 words.  The scoring is also
modified relative to all the other documents.

Essentially what I want to do, and I would have thought it would have been
relatively easy, is to remove the process of modifying the score based on
the number of words in the document.

What I want is for the search to return documents that contain the most
amount of hits regardless of the size of the document.

Right now I am currently having to implement a very costly and time
consuming Regex parsing system that scans the results delivered back by the
lucene index.

My scoring method is demonstrated below..

Example: the user searches for dog and cat and fish

We find 5 documents with these criteria and below are the hits for each
document

1 1 1
4 0 2
3 3 3
5 0 5
2 4 2

So what we do is take each hit for each document and we divide it by the
highest hit encountered for that criteria.

First document scoring    //Numbers in (brackets) are the highest hit value
for that criteria

(1 / (5)) + (1 / (4)) + (1 / (5)) = 0.65

To get the final score we divide 0.65 by 3 and multiply by 100 (for
percentage) which gives us ~ 21.6%

And the next document

(4 / (5)) + (0 / (4)) + (2 / (5)) = 1.30    ->     (1.3/3)*100=~ 43.3%

And the third

(3 / (5)) + (3 / (4)) + (3 / (5)) = 1.45    ->     (1.4/3)*100=~ 48.3%


And that's it, I guess my question is, is it possible to modify lucene
scoring like this?

Thanks for your time
-- 
View this message in context: http://www.nabble.com/Scoring-modification-question-tp21580240p21580240.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message