lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: recall/precision with lucene
Date Sat, 09 Feb 2008 22:00:14 GMT
Op Saturday 09 February 2008 01:59:12 schreef Panos Konstantinidis:
> Hello I am a new lucene user. I am trying to calculate the recall/precision of
> a query and I was wondering if lucene provides an easy way to do it. 
> 
> Currently I have a number of documents that match a given query. Then I am
> doing a search and I am getting back all the Hits. I then divide the number of
> documents that came back from lucene (the Hits size) with the number of
> documents that should have got. This is how I calculate the recall.

Since you're going to use all hits for the query, it is normally better to avoid
Hits and use a HitCollector or a TopDocs.
 
> For precision I just get the hits.score() of each relevant document. I am not
> sure if I am on the right track or if there is an easier/better way to do it. I
> would appreciate any insigith into this.

To use the score value for precision one could define a cut off value for
the score value, but then the calculation for recall would also need to
be adapted. For this a HitCollector would be good.

In case you want the results sorted by decreasing score value have
a look at the search methods that return TopDocs. From this one
can make a precision/recall graph for the query by considering
the total results higher than a given score.

When a lot of such computations are needed, you may also want
to cache the values of a unique identifier field for all indexed docs,
have a look at FieldCache for this.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message