lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: TREC Data and Topic-Specific Index
Date Mon, 08 Feb 2010 03:59:22 GMT
you should do (a), and pretend you know nothing about the relevance
judgements up front.

it is true you might make some change to your search engine and wonder, how
is it fair that I am bringing back possibly relevant docs that were never
judged (and thus scored implicitly as non-relevant)? i.e. the test
collection is biased against you because you did not participate in the
pooling process.

if you are concerned about this, you should still use (a), but perhaps look
at other measures such as bpref (
http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/sigirBuckley2004.pdf).

personally, I simply prefer to stick with MAP. And with all measures,
whether you look at bpref or map, my advice is to only consider large
differences only when evaluating some potential improvement!

On Sun, Feb 7, 2010 at 6:49 PM, Ivan Provalov <iprovalo@yahoo.com> wrote:

> Robert,
>
> We are using TREC-3 data and Ad Hoc topics 151-200.  The relevance
> judgments list contains 97,319 entries, of which 68,559 are unique document
> ids.  The TIPSTER collection which was used in TREC-3 is around 750,000
> documents.
>
> Should we (a) index the entire 750,000 document collection, or (b) the
> document collection of the 68,559 unique documents listed in the qrels, or
> (c) should we limit our index to each specific topic (about 2,000 docs) i.e.
> to the documents listed for a particular topic in the qrels?
>
> Thanks,
>
> Ivan
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message