lucene-openrelevance-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mbenn...@ideaeng.com>
Subject Revising Precision and Recall (was Re: Calculating a search engine's MAP)
Date Wed, 23 Dec 2009 23:07:13 GMT
Hello Ludovico,

I'm coming into this conversation a bit late... but this is an interesting
subject to me.

People have talked about "precision and recall" for a very long time.
Personally I'm tired of it because those metrics don't really talk about the
order of the results or facilitating navigation.

I'd like to see "rank/order", and "interactivity" routinely factored in.

Rank/order:
For example, given 100 documents that should match, how many made it into
the top 3 slots of the results list, or the top 5, top 10, etc.
How about summing the MMR for all documents that were returned:
http://en.wikipedia.org/wiki/MRR

Interactivity/disambiguation:
Somebody's search for "phoenix" matches 100,000 docs, but on the first page
of results, are they clearly shown the various contexts, that Phoenix is
both a state in Arizon and a Mythological creature?
For example, the disambiguation page for Phoenix has way over 100 different
contexts:
http://en.wikipedia.org/wiki/Phoenix
How about a ratio of how many contexts a search engine presented on the
first page, vs. the number of contexts on the Wikipedia page.  Of course
that would only work for things that have a disambiguation page, but I still
think it's an interesting question to ask.

Or even the number of narrowing-context links the engine displays.

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Wed, Dec 23, 2009 at 6:45 AM, Grant Ingersoll <gsingers@apache.org>wrote:

>
> On Dec 16, 2009, at 8:26 AM, Ludovico Boratto wrote:
>
> Hi,
> thanks for your reply.
> How can trec_eval work properly?
> A standard TREC rank contains 1000 results, while the relevants judgments
> are about a much smaller amount of documents (usually 50).
>
> How can I calculate precision and recall if I don't know how relevant are
> the 95% of the documents in the ranking I produced?
>
>
> You calculate them on the amount you have, i.e. precision at 50 (or 10 or
> whatever).  1000 results is overkill for all practical purposes anyway.
>
> The formula is pretty straightforward:
> http://en.wikipedia.org/wiki/Precision_%28information_retrieval%29
>
> -Grant
>

Mime
View raw message