Return-Path: Delivered-To: apmail-lucene-openrelevance-user-archive@minotaur.apache.org Received: (qmail 58868 invoked from network); 16 Dec 2009 13:26:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Dec 2009 13:26:43 -0000 Received: (qmail 80365 invoked by uid 500); 16 Dec 2009 13:26:43 -0000 Delivered-To: apmail-lucene-openrelevance-user-archive@lucene.apache.org Received: (qmail 80321 invoked by uid 500); 16 Dec 2009 13:26:42 -0000 Mailing-List: contact openrelevance-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: openrelevance-user@lucene.apache.org Delivered-To: mailing list openrelevance-user@lucene.apache.org Received: (qmail 80312 invoked by uid 99); 16 Dec 2009 13:26:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 13:26:42 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ludovico.boratto@gmail.com designates 209.85.218.222 as permitted sender) Received: from [209.85.218.222] (HELO mail-bw0-f222.google.com) (209.85.218.222) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 13:26:40 +0000 Received: by bwz22 with SMTP id 22so721120bwz.5 for ; Wed, 16 Dec 2009 05:26:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=CefvEoXHcSdRLf2QSObXxAcgYJENwME2DSS8V6TmNg4=; b=MnINrl6OVOsYqiiiCjx7Ui3CLljACc6u/IoUZIMCVKWNF4WdioCv38VzAnFVHrRGWu 8tPahA+VvLYn5dOxinGxLXZSUUnYTMiEv+kvEy2vJNlMs1p/+eFhjpIGFxia7HftfBRh PDg2gkHqQoZ1DIvUm+rMq1mWSdHTAbuj+JFTk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=GyJpgmydpiUSlsHcIpnUbsXlYPq5t4haYnyusmET4WnQS1klKtCFeannY9YWCGswZX eWvjDIjpH0lyHAg+XMyNgWt/bcBqkzksyhFS4KRUEvxV72KNBaT8RszIvTzLiYrTljYZ l1xqlrwInByXvjfprE6YhtP4lggaHCfXcVbp4= MIME-Version: 1.0 Received: by 10.204.34.20 with SMTP id j20mr554001bkd.57.1260969979087; Wed, 16 Dec 2009 05:26:19 -0800 (PST) In-Reply-To: <927D02FA-9D31-4747-8837-D6CF93CC8649@apache.org> References: <2d301e690912090646u6b99e63bmfc430e7b3b27c193@mail.gmail.com> <927D02FA-9D31-4747-8837-D6CF93CC8649@apache.org> Date: Wed, 16 Dec 2009 14:26:18 +0100 Message-ID: <2d301e690912160526v392d10t7c803b8dc05def8@mail.gmail.com> Subject: Re: Calculating a search engine's MAP From: Ludovico Boratto To: openrelevance-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000325558fd6ad1206047ad873d8 --000325558fd6ad1206047ad873d8 Content-Type: text/plain; charset=ISO-8859-1 Hi, thanks for your reply. How can trec_eval work properly? A standard TREC rank contains 1000 results, while the relevants judgments are about a much smaller amount of documents (usually 50). How can I calculate precision and recall if I don't know how relevant are the 95% of the documents in the ranking I produced? Thanks in advance for your help, Ludovico 2009/12/9 Grant Ingersoll > > On Dec 9, 2009, at 9:46 AM, Ludovico Boratto wrote: > > > Hi everyone, > > I'm a PhD student, and I was wondering how is it possible to evaluate a > search engine performances with a dataset like the ones made available for > the TREC tracks. > > > > The problem I have is: once I submit a query and the search engine forms > a list of ranked results, how do I know which documents are relevant and the > ones that are not? > > I know that with each track there are relevant judgments available, but > those judgments are about a small number of queries and documents. > > > > I think there is two questions in here, if I'm not mistaken. First, TREC > delivers a set of qrels files that capture this information based on the way > they do relevance pooling. If you search for trec_eval, you will find a > tool that takes in your results and the TREC judgments and outputs MAP, etc. > The second question I'm inferring is you want to know how does this apply to > your search application. That is, how do you judge relevance for your app. > The answer to that is a bit harder. Essentially, you need to go through > and create the queries and judgments. Many people use log analysis to > achieve this. You might find > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-Searchhelpful. > > > > Since these are my first steps in the IR world, I hope you don't mind > helping me, please. > > > > Thanks in advance for your help, I'm looking forward to hearing from you > soon. > > Yours faithfully, > > Ludovico > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > --000325558fd6ad1206047ad873d8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,
thanks for your reply.
How can trec_eval work properly?
A stan= dard TREC rank contains 1000 results, while the relevants judgments are abo= ut a much smaller amount of documents (usually 50).

How can I calcul= ate precision and recall if I don't know how relevant are the 95% of th= e documents in the ranking I produced?

Thanks in advance for your help,
Ludovico

2009/12/9 Grant Ingersoll <gsingers@apache.org>

On Dec 9, 2009, at 9:46 AM, Ludovico Boratto wrote:

> Hi everyone,
> I'm a PhD student, and I was wondering how is it possible to evalu= ate a search engine performances with a dataset like the ones made availabl= e for the TREC tracks.
>
> The problem I have is: once I submit a query and the search engine for= ms a list of ranked results, how do I know which documents are relevant and= the ones that are not?
> I know that with each track there are relevant judgments available, bu= t those judgments are about a small number of queries and documents.
>

I think there is two questions in here, if I'm not mistaken. =A0F= irst, TREC delivers a set of qrels files that capture this information base= d on the way they do relevance pooling. =A0If you search for trec_eval, you= will find a tool that takes in your results and the TREC judgments and out= puts MAP, etc.
=A0
The second question I'm inferring is you want to know how does this app= ly to your search application. =A0That is, how do you judge relevance for y= our app. =A0 The answer to that is a bit harder. =A0Essentially, you need t= o go through and create the queries and judgments. =A0Many people use log a= nalysis to achieve this. =A0You might find http://www.lucidimagination.com/Community/Hea= r-from-the-Experts/Articles/Debugging-Relevance-Issues-Search helpful.<= br>


> Since these are my first steps in the IR world, I hope you don't m= ind helping me, please.
>
> Thanks in advance for your help, I'm looking forward to hearing fr= om you soon.
> Yours faithfully,
> Ludovico

--------------------------
Grant Ingersoll
http://www.l= ucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using So= lr/Lucene:
http:/= /www.lucidimagination.com/search


--000325558fd6ad1206047ad873d8--