lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Provalov <iprov...@yahoo.com>
Subject Re: TREC Data and Topic-Specific Index
Date Wed, 10 Feb 2010 14:14:36 GMT
Robert,

Thank you for your reply.  What would be considered a large difference?  We started applying
the Sweet Spot Similarity.  It gives us an improvement of 0.163-0.141=0.022 MAP so far.  LnbLtcSimilarity
gets us more improvement: 0.175-0.141=0.034.

Thanks,

Ivan

--- On Sun, 2/7/10, Robert Muir <rcmuir@gmail.com> wrote:

> From: Robert Muir <rcmuir@gmail.com>
> Subject: Re: TREC Data and Topic-Specific Index
> To: java-user@lucene.apache.org
> Date: Sunday, February 7, 2010, 10:59 PM
> you should do (a), and pretend you
> know nothing about the relevance
> judgements up front.
> 
> it is true you might make some change to your search engine
> and wonder, how
> is it fair that I am bringing back possibly relevant docs
> that were never
> judged (and thus scored implicitly as non-relevant)? i.e.
> the test
> collection is biased against you because you did not
> participate in the
> pooling process.
> 
> if you are concerned about this, you should still use (a),
> but perhaps look
> at other measures such as bpref (
> http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/sigirBuckley2004.pdf).
> 
> personally, I simply prefer to stick with MAP. And with all
> measures,
> whether you look at bpref or map, my advice is to only
> consider large
> differences only when evaluating some potential
> improvement!
> 
> On Sun, Feb 7, 2010 at 6:49 PM, Ivan Provalov <iprovalo@yahoo.com>
> wrote:
> 
> > Robert,
> >
> > We are using TREC-3 data and Ad Hoc topics
> 151-200.  The relevance
> > judgments list contains 97,319 entries, of which
> 68,559 are unique document
> > ids.  The TIPSTER collection which was used in
> TREC-3 is around 750,000
> > documents.
> >
> > Should we (a) index the entire 750,000 document
> collection, or (b) the
> > document collection of the 68,559 unique documents
> listed in the qrels, or
> > (c) should we limit our index to each specific topic
> (about 2,000 docs) i.e.
> > to the documents listed for a particular topic in the
> qrels?
> >
> > Thanks,
> >
> > Ivan
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> -- 
> Robert Muir
> rcmuir@gmail.com
> 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message