lucene-openrelevance-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ludovico Boratto <ludovico.bora...@gmail.com>
Subject Re: Calculating a search engine's MAP
Date Thu, 14 Jan 2010 09:32:26 GMT
Hi everyone,
sorry if I'm bothering you, but I really can't get out of this problem.
I have my search engine developed, but I don't know how to test it.
Let me briefly introduce you how it works...

My algorithm is based on implicit feedbacks. A feedback is collected each
time a user finds a relevant resource during a search in a tagging system.
The algorithm uses the feedback to dynamically strengthen associations
between the resource indicated by the user and the keywords used in the
search string. Keyword-resource associations are used by the algorithm to
rank the results.

I have been looking for ages for a proper dataset that would work with my
algorithm.
I was thinking about using TREC's 2008 Relevance Feedback dataset:
http://trec-relfeed.googlegroups.com/web/Guidelines08?gda=gZ0eUT4AAABtm9akyKg9pgh0qJJTHfy7X57I390rHU2uANbDSEOX3Kddn9WBc2Ae6sNICG8Kz2zjsKXVs-X7bdXZc5buSfmx
As you can see from the document, for each query, one (or more) relevance
feedbacks is given (i.e. one or more relevant results).
The thing is: how can I evaluate the quality of the ranking produced by my
system?
Should I compare it with the ranking produced by another system, like Indri
or Lucene?

I really hope you can help me, I'm stuck with this problem and don't know
how to solve it.
Thanks in advance for your help.
Cheers,
Ludovico

2009/12/23 Grant Ingersoll <gsingers@apache.org>

>
> On Dec 16, 2009, at 8:26 AM, Ludovico Boratto wrote:
>
> Hi,
> thanks for your reply.
> How can trec_eval work properly?
> A standard TREC rank contains 1000 results, while the relevants judgments
> are about a much smaller amount of documents (usually 50).
>
> How can I calculate precision and recall if I don't know how relevant are
> the 95% of the documents in the ranking I produced?
>
>
> You calculate them on the amount you have, i.e. precision at 50 (or 10 or
> whatever).  1000 results is overkill for all practical purposes anyway.
>
> The formula is pretty straightforward:
> http://en.wikipedia.org/wiki/Precision_%28information_retrieval%29
>
> -Grant
>

Mime
View raw message