lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Average Precision - TREC-3
Date Wed, 27 Jan 2010 16:16:53 GMT
Hello, forgive my ignorance here (I have not worked with these english TREC
collections), but is the TREC-3 test collection the same as the test
collection used in the 2007 paper you referenced?

It looks like that is a different collection, its not really possible to
compare these relevance scores across different collections.

On Wed, Jan 27, 2010 at 11:06 AM, Grant Ingersoll <gsingers@apache.org>wrote:

>
> On Jan 26, 2010, at 8:28 AM, Ivan Provalov wrote:
>
> > We are looking into making some improvements to relevance ranking of our
> search platform based on Lucene.  We started by running the Ad Hoc TREC task
> on the TREC-3 data using "out-of-the-box" Lucene.  The reason to run this
> old TREC-3 (TIPSTER Disk 1 and Disk 2; topics 151-200) data was that the
> content is matching the content of our production system.
> >
> > We are currently getting average precision of 0.14.  We found some format
> issues with the TREC-3 data which were causing even lower score.  For
> example, the initial average precision number was 0.9.  We discovered that
> the topics included the word "Topic:" in the <title> tag.  For example,
> > "<title> Topic:  Coping with overcrowded prisons".  By removing this term
> from the queries, we bumped the average precision to 0.14.
>
> There's usually a lot of this involved in running TREC.  I've also seen a
> good deal of improvement from things like using phrase queries and the
> Dismax Query Parser in Solr (which uses DisjunctionQuery in Lucene, amongst
> other things) and by playing around with length normalization.
>
>
> >
> > Our query is based on the title tag of the topic and the index field is
> based on the <TEXT> tag of the document.
> >
> > QualityQueryParser qqParser = new SimpleQQParser("title", "TEXT");
> >
> > Is there an average precision number which "out-of-the-box" Lucene should
> be close to?  For example, this IBM's 2007 TREC paper mentions 0.154:
> > http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
>
> Hard to say.  I can't say I've run TREC 3.  You might ask over on the Open
> Relevance list too (http://lucene.apache.org/openrelevance).  I know
> Robert Muir's done a lot of experiments with Lucene on standard collections
> like TREC.
>
> I guess the bigger question back to you is what is your goal?  Is it to get
> better at TREC or to actually tune your system?
>
> -Grant
>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message