lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Provalov <iprov...@yahoo.com>
Subject Re: TREC Data and Topic-Specific Index
Date Thu, 11 Feb 2010 15:47:43 GMT
Thank you, Robert.

--- On Wed, 2/10/10, Robert Muir <rcmuir@gmail.com> wrote:

> From: Robert Muir <rcmuir@gmail.com>
> Subject: Re: TREC Data and Topic-Specific Index
> To: java-user@lucene.apache.org
> Date: Wednesday, February 10, 2010, 9:23 AM
> Hi, so you mean around 15% and 24%
> respectively? i think you could fairly
> say either of these is an improvement over your baseline of
> 0.141
> 
> what i mean by large difference, is while I think its safe
> to say that using
> either of these methods improves over your baseline, i am
> not sure you can
> conclude that either improvement is better than the other,
> 
> you can apply various statistical tests to try to figure
> this out, but
> because you didn't participate in the pool with these runs,
> you would have
> to be careful about drawing conclusions as to which
> similarity is best, as
> there is some bias and error involved.
> 
> On Wed, Feb 10, 2010 at 9:14 AM, Ivan Provalov <iprovalo@yahoo.com>
> wrote:
> 
> > Robert,
> >
> > Thank you for your reply.  What would be
> considered a large difference?  We
> > started applying the Sweet Spot Similarity.  It
> gives us an improvement of
> > 0.163-0.141=0.022 MAP so far.  LnbLtcSimilarity
> gets us more improvement:
> > 0.175-0.141=0.034.
> >
> > Thanks,
> >
> > Ivan
> >
> > --- On Sun, 2/7/10, Robert Muir <rcmuir@gmail.com>
> wrote:
> >
> > > From: Robert Muir <rcmuir@gmail.com>
> > > Subject: Re: TREC Data and Topic-Specific Index
> > > To: java-user@lucene.apache.org
> > > Date: Sunday, February 7, 2010, 10:59 PM
> > > you should do (a), and pretend you
> > > know nothing about the relevance
> > > judgements up front.
> > >
> > > it is true you might make some change to your
> search engine
> > > and wonder, how
> > > is it fair that I am bringing back possibly
> relevant docs
> > > that were never
> > > judged (and thus scored implicitly as
> non-relevant)? i.e.
> > > the test
> > > collection is biased against you because you did
> not
> > > participate in the
> > > pooling process.
> > >
> > > if you are concerned about this, you should still
> use (a),
> > > but perhaps look
> > > at other measures such as bpref (
> > >
> > http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/sigirBuckley2004.pdf<http://comminfo.rutgers.edu/%7Emuresan/IR/Docs/Articles/sigirBuckley2004.pdf>
> > ).
> > >
> > > personally, I simply prefer to stick with MAP.
> And with all
> > > measures,
> > > whether you look at bpref or map, my advice is to
> only
> > > consider large
> > > differences only when evaluating some potential
> > > improvement!
> > >
> > > On Sun, Feb 7, 2010 at 6:49 PM, Ivan Provalov
> <iprovalo@yahoo.com>
> > > wrote:
> > >
> > > > Robert,
> > > >
> > > > We are using TREC-3 data and Ad Hoc topics
> > > 151-200.  The relevance
> > > > judgments list contains 97,319 entries, of
> which
> > > 68,559 are unique document
> > > > ids.  The TIPSTER collection which was
> used in
> > > TREC-3 is around 750,000
> > > > documents.
> > > >
> > > > Should we (a) index the entire 750,000
> document
> > > collection, or (b) the
> > > > document collection of the 68,559 unique
> documents
> > > listed in the qrels, or
> > > > (c) should we limit our index to each
> specific topic
> > > (about 2,000 docs) i.e.
> > > > to the documents listed for a particular
> topic in the
> > > qrels?
> > > >
> > > > Thanks,
> > > >
> > > > Ivan
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > Robert Muir
> > > rcmuir@gmail.com
> > >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> -- 
> Robert Muir
> rcmuir@gmail.com
> 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message