lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: How to het the score in percentage
Date Fri, 01 May 2009 21:52:31 GMT

: here ie, in our existing system we are showing the search score in
: percenetage but lucene provides the search score in numbers which is derived
: from some internal logic. Can anybody give some tips for converting the
: lucene score to percentage or is there any way to retrive the score as
: percentage from lucene search. 

there is an extremely important and fundemental question you have to 
answer when you say you want "the score as a percentage" ... 

	A percentage of what exactly?

As Erick has pointed out: it's easy to convert a document's numeric score 
into a percentage of the highest scoring document using division ... but 
that is just as meaningless as dividing by pi.  Consider the following 

   query = apple eclipse zzz yyy xxx qqq kkk ttt rrr

Now imagine that three "documents" match this query, with the arbitrary 
scores listed...

   2.345 doc1: apple bannana
  16.415 doc2: zzz yyy xxx qqq kkk ttt rrr 111
   2.345 doc3: eclipse moon sun

(we're ignoring idf and norms for the moment).  obviously doc2 has the 
highest score, and is in fact a *very* good match for your query, so it's 
fine that it gets a score of 100%.  doc1 and doc2 each get scores of 14%.

now what happens if doc2 gets deleted from my index?

doc1 and doc3 both still match the query, and they now both tie for the 
"highest" score, so now suddenly they have scores of 100%.

This is going to confuse the hell out of your users.  the query hasn't 
changed, doc1 and doc2 didn't change.  doc1 didn't suddenly become 7 
times more relevant to your query then it was 5 minutes ago.

you might say: "i'm never going to delete documents", but are you ever 
going to add documents?  because if so you're going to have the same 
problem.  if you never modify your index at all ... well then i envy you, 
but you're still going to have a problem -- namely that people are going 
to look at a docA that scores 92% against queryX and a differnet docB that 
scores 68% against queryY and say "WTF? docB is a much better match queryY 
then docA is for query!?!?"

score values are meaningful only for purposes of comparison between other 
documents for the exact same query and the exact same index.  when you try 
to compute a percentage, you are setting up an implicit comparison with 
scores from other queries.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message