Mailing-List: contact openrelevance-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: openrelevance-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of ludovico.boratto@gmail.com
 designates 209.85.218.222 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=GyJpgmydpiUSlsHcIpnUbsXlYPq5t4haYnyusmET4WnQS1klKtCFeannY9YWCGswZX
         eWvjDIjpH0lyHAg+XMyNgWt/bcBqkzksyhFS4KRUEvxV72KNBaT8RszIvTzLiYrTljYZ
         l1xqlrwInByXvjfprE6YhtP4lggaHCfXcVbp4=
MIME-Version: 1.0
In-Reply-To: <927D02FA-9D31-4747-8837-D6CF93CC8649@apache.org>
References: <2d301e690912090646u6b99e63bmfc430e7b3b27c193@mail.gmail.com>
	 <927D02FA-9D31-4747-8837-D6CF93CC8649@apache.org>
Date: Wed, 16 Dec 2009 14:26:18 +0100
Message-ID: <2d301e690912160526v392d10t7c803b8dc05def8@mail.gmail.com>
Subject: Re: Calculating a search engine's MAP
From: Ludovico Boratto <ludovico.boratto@gmail.com>
To: openrelevance-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=000325558fd6ad1206047ad873d8

--000325558fd6ad1206047ad873d8
Content-Type: text/plain; charset=ISO-8859-1

Hi,
thanks for your reply.
How can trec_eval work properly?
A standard TREC rank contains 1000 results, while the relevants judgments
are about a much smaller amount of documents (usually 50).

How can I calculate precision and recall if I don't know how relevant are
the 95% of the documents in the ranking I produced?

Thanks in advance for your help,
Ludovico

2009/12/9 Grant Ingersoll <gsingers@apache.org>

>
> On Dec 9, 2009, at 9:46 AM, Ludovico Boratto wrote:
>
> > Hi everyone,
> > I'm a PhD student, and I was wondering how is it possible to evaluate a
> search engine performances with a dataset like the ones made available for
> the TREC tracks.
> >
> > The problem I have is: once I submit a query and the search engine forms
> a list of ranked results, how do I know which documents are relevant and the
> ones that are not?
> > I know that with each track there are relevant judgments available, but
> those judgments are about a small number of queries and documents.
> >
>
> I think there is two questions in here, if I'm not mistaken.  First, TREC
> delivers a set of qrels files that capture this information based on the way
> they do relevance pooling.  If you search for trec_eval, you will find a
> tool that takes in your results and the TREC judgments and outputs MAP, etc.
>

The second question I'm inferring is you want to know how does this apply to
> your search application.  That is, how do you judge relevance for your app.
>   The answer to that is a bit harder.  Essentially, you need to go through
> and create the queries and judgments.  Many people use log analysis to
> achieve this.  You might find
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-Searchhelpful.
>
>
> > Since these are my first steps in the IR world, I hope you don't mind
> helping me, please.
> >
> > Thanks in advance for your help, I'm looking forward to hearing from you
> soon.
> > Yours faithfully,
> > Ludovico
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

--000325558fd6ad1206047ad873d8
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,<br>thanks for your reply.<br>How can trec_eval work properly?<br>A stan=
dard TREC rank contains 1000 results, while the relevants judgments are abo=
ut a much smaller amount of documents (usually 50).<br><br>How can I calcul=
ate precision and recall if I don&#39;t know how relevant are the 95% of th=
e documents in the ranking I produced?<br>
<br>Thanks in advance for your help,<br>Ludovico<br><br><div class=3D"gmail=
_quote">2009/12/9 Grant Ingersoll <span dir=3D"ltr">&lt;<a href=3D"mailto:g=
singers@apache.org">gsingers@apache.org</a>&gt;</span><br><blockquote class=
=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, 204); margin=
: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class=3D"im"><br>
On Dec 9, 2009, at 9:46 AM, Ludovico Boratto wrote:<br>
<br>
&gt; Hi everyone,<br>
&gt; I&#39;m a PhD student, and I was wondering how is it possible to evalu=
ate a search engine performances with a dataset like the ones made availabl=
e for the TREC tracks.<br>
&gt;<br>
&gt; The problem I have is: once I submit a query and the search engine for=
ms a list of ranked results, how do I know which documents are relevant and=
 the ones that are not?<br>
&gt; I know that with each track there are relevant judgments available, bu=
t those judgments are about a small number of queries and documents.<br>
&gt;<br>
<br>
</div>I think there is two questions in here, if I&#39;m not mistaken. =A0F=
irst, TREC delivers a set of qrels files that capture this information base=
d on the way they do relevance pooling. =A0If you search for trec_eval, you=
 will find a tool that takes in your results and the TREC judgments and out=
puts MAP, etc.<br>

=A0</blockquote><blockquote class=3D"gmail_quote" style=3D"border-left: 1px=
 solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
The second question I&#39;m inferring is you want to know how does this app=
ly to your search application. =A0That is, how do you judge relevance for y=
our app. =A0 The answer to that is a bit harder. =A0Essentially, you need t=
o go through and create the queries and judgments. =A0Many people use log a=
nalysis to achieve this. =A0You might find <a href=3D"http://www.lucidimagi=
nation.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Iss=
ues-Search" target=3D"_blank">http://www.lucidimagination.com/Community/Hea=
r-from-the-Experts/Articles/Debugging-Relevance-Issues-Search</a> helpful.<=
br>

<div class=3D"im"><br>
<br>
&gt; Since these are my first steps in the IR world, I hope you don&#39;t m=
ind helping me, please.<br>
&gt;<br>
&gt; Thanks in advance for your help, I&#39;m looking forward to hearing fr=
om you soon.<br>
&gt; Yours faithfully,<br>
&gt; Ludovico<br>
<br>
</div>--------------------------<br>
Grant Ingersoll<br>
<a href=3D"http://www.lucidimagination.com/" target=3D"_blank">http://www.l=
ucidimagination.com/</a><br>
<br>
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using So=
lr/Lucene:<br>
<a href=3D"http://www.lucidimagination.com/search" target=3D"_blank">http:/=
/www.lucidimagination.com/search</a><br>
<br>
</blockquote></div><br>

--000325558fd6ad1206047ad873d8--