lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gururaja H <guru_h...@yahoo.com>
Subject Re: Relevance percentage
Date Tue, 21 Dec 2004 06:59:32 GMT
Thanks much for the reply.

Paul Elschot <paul.elschot@xs4all.nl> wrote:On Monday 20 December 2004 15:09, Gururaja
H wrote:
> Hi,
> 
> But, How to calculate the coord() fraction ? I know by default,
> in DefaultSimilarity the coord() fraction is defined as below:
> 
> /** Implemented as overlap / maxOverlap. */
> 
> public float coord(int overlap, int maxOverlap) {
> 
> return overlap / (float)maxOverlap;
> 
> }
> How to get the overlap and maxOverlap value in each of the matched 
document(s) ?

In case you only want the coordination factor to have more influence
in the order of your search results you can use a Similarity with
a coord() function that has a power higher than 1:

public float coord(int overlap, int maxOverlap) {
return (float) Math.pow((overlap / (float)maxOverlap), SOME_POWER);
}

I'd first try values between 3.0f and 5.0f for SOME_POWER.

The searching code precomputes all coord values once per query
per search, so there is no need to worry about the computing efficiency.

This has the advantage that the other scoring factors are still used
for ranking.

Since the other factors can vary quite a bit, it is difficult to guarantee
that any coord() implementation will provide a score that sorts by the
number of matching clauses. Higher powers as above can come
a long way, though.

Regards,
Paul Elschot



> Thanks,
> Gururaja
> 
> Mike Snare wrote:
> I'm still new to Lucene, but wouldn't that be the coord()? My
> understanding is that the coord() is the fraction of the boolean query
> that matched a given document.
> 
> Again, I'm new, so somebody else will have to confirm or deny...
> 
> -Mike
> 
> 
> On Mon, 20 Dec 2004 00:33:21 -0800 (PST), Gururaja H
> wrote:
> > How to find out the percentages of matched terms in the document(s) using 
Lucene ?
> > Here is an example, of what i am trying to do:
> > The search query has 5 terms(ibm, risc, tape, dirve, manual) and there are 
4 matching
> > documents with the following attributes:
> > Doc#1: contains terms(ibm,drive)
> > Doc#2: contains terms(ibm,risc, tape, drive)
> > Doc#3: contains terms(ibm,risc, tape,drive)
> > Doc#4: contains terms(ibm, risc, tape, drive, manual).
> > The percentages displayed would be 100%(Doc#4), 80%(doc#2), 80%(doc#3) and 
40%
> > (doc#1).
> > 
> > Any help on how to go about doing this ?
> > 
> > Thanks,
> > Gururaja
> > 
> > 
> > ---------------------------------
> > Do you Yahoo!?
> > Send a seasonal email greeting and help others. Do good.
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------
> Do you Yahoo!?
> All your favorites on one personal page  Try My Yahoo!


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



		
---------------------------------
Do you Yahoo!?
 Yahoo! Mail - 250MB free storage. Do more. Manage less.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message