lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: Search Score percentage, Should not be relative to the highest score
Date Mon, 03 Jan 2011 16:34:23 GMT

So, can we say that if you have something that gives you the "how many query terms matched"
info, will that satisfy your requirement?

Query: term1 term2

Doc1: term1 term2   => n=2 => %100
Doc2: term1 term2 term3 term4 => n=2 => %100 
Doc3: term1 term1 term3   => n=1 => %50
Doc4: term2 term3 term4   => n=1 => %50


If yes Explanation will you give that info in coord part. For example coord(1/3) means one
query term matched and there are total 3 query terms.

Here is an example Explanation:

0.013397463 = (MATCH) product of:
  0.040192388 = (MATCH) sum of:
    0.040192388 = (MATCH) weight(pagetext:para in 34930), product of:
      0.46250778 = queryWeight(pagetext:para), product of:
        3.1780937 = idf(docFreq=5546, maxDocs=48977)
        0.14552994 = queryNorm
      0.086901 = (MATCH) fieldWeight(pagetext:para in 34930), product of:
        1.0 = tf(termFreq(pagetext:para)=1)
        3.1780937 = idf(docFreq=5546, maxDocs=48977)
        0.02734375 = fieldNorm(field=pagetext, doc=34930)
  0.33333334 = coord(1/3)



--- On Mon, 1/3/11, Amr ElAdawy <Amr.ElAdawy@etisalat.com> wrote:

> From: Amr ElAdawy <Amr.ElAdawy@etisalat.com>
> Subject: Re: Search Score percentage, Should not be relative to the highest score
> To: java-user@lucene.apache.org
> Date: Monday, January 3, 2011, 3:09 PM
> 
> Consider the following.
> 
> Query: term1 term2
> Doc1: term1 term2
> Doc2: term1 term2 term3 term4 
> Doc3: term1 term1 term3
> Doc4: term3 term4
> 
> For the above documents, Doc1 and Doc2 will b exact match (
> as they contain
> all the terms in the search Query). Doc3 is partially match
> as it contains
> term1 only (we neglect the term frequency tf always 1
> 
> 
> The score percentage ( calculated by Lucene in Hits.java
> line 133) and will
> be 
> 
> Doc1: 100%
> Doc2: 100%
> Doc3:  80%
> 
> This is not a problem at all, the problem occurs when there
> is no exact
> matching document as following:
> 
> Query: term1 term2
> Doc1: term1 term3
> Doc2: term2  term3 term4 
> Doc3: term1 term1 term3
> Doc4: term3 term4
> 
> 
> The score will be calculated as 
> 
> Doc1: 100%
> Doc2: 100%
> Doc3:  50%
> 
> You can see that Doc1 and Doc2 got 100% despite that they
> are not exact
> match. but as they got the highest score, Lucene considers
> them 100% match. 
> 
> This is my problem
> 
> All I need is to make the percentage correct in the second
> case so it will
> be something as 
> 
> Doc1: 50% 
> Doc2: 50%
> Doc3:  30%
> 
> I hope I made myself clear.
> 
> 
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/Search-Score-percentage-Should-not-be-relative-to-the-highest-score-tp2183420p2184613.html
> Sent from the Lucene - Java Users mailing list archive at
> Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message